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Chapter  I 
INTRODUCTION 

In  the  years  between  1950  and  1964,  which  Benjamin  [3]  and 
others  refer  to  as  the  second  generation  of  computers,  programs 
were  run  in  a "job  shop"  fashion.  Each  organizational  function 
coded  their  own  programs  and  utilized,  maintained,  and  safe- 
guarded their  own  data  bases.  With  the  advent  of  the  third 
generation  of  computers,  larger  main  memories,  sophisticated 
operating  systems,  and  procedural  languages  were  more  readily 
available.  Organizations  became  aware  of  the  benefits  that  could 
be  incurred  by  combining  their  data  for  different  functions  (e.g. 
payroll,  employee  benefits,  accounting,  etc.).  This  new  concept, 
that  of  different  functions  sharing  the  same  data  base,  gave 
birth  to  the  field  of  Data  Base  Management  (DBM). 

According  to  Fry  [17],  ".  . . The  original  impetus  for 
generalized  processing  came  from  the  military  because  of  the  large 
volume  of  data  that  had  to  be  processed  quickly."  He  states  that 
there  were  other  motivations  for  the  development  of  the  Data  Base 
Management  System  (DBMS)  technology.  Ihese  motivations  are; 

1)  decrease  the  programming  effort  and  lead  time  from 
data  base  application  design  to  operational  capability; 

2)  allow  a non-programmer  to  interface  with  the  data  base; 

3)  decrease  the  effort  required  to  respond  to  changing 
requirements;  and 
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4)  provide  central  control  of  the  data  bases. 

Data  base  technology  has-  now  reached  the  point  where  an 


organization  may  buy,  from  software  houses  or  computer  manufac- 
turers, off-the-shelf  systems.  According  to  Datapro  Research 
Corporation  [11]:  "Today,  data  base  management  systems  and  data 
communications  monitors  are  running  neck  and  neck  as  the  subjects 
that  generate  more  inquiries  to  Datapro' s Telephone  Consulting 
Service  than  any  other  software  topic."  In  conjunction  with 
industrial  sector  accomplishments  in  the  DBM  field,  the  research 
and  development  sector  has  been  working  diligently  at  trying  to 
solve  some  of  the  software  and  hardware  problems.  Hie  software 
interested  people  have  been  looking  at  such  topics  as  hashing 
algorithms,  logical  data  models,  optimum  reorganization  of 
physical  structures,  and  search  mechanisms. 

The  research  addressed  here  is  primarily  oriented  toward 
the  hardware  aspects  of  the  DBM  field,  although  it  is  somewhat 
concerned  with  the  software  area.  The  general  problem  is  to 
determine  the  "functions"  performed  in  DBM  through  modeling  and 
then  analyze  how  some  of  them  might  be  implemented  in  hardware. 

In  Chapter  II,  a selective  set  of  significant  papers  is  discussed. 
Hiese  papers  are  divided  into  two  groups.  Hie  first  group 
consists  of  papers  pertaining  to  the  modeling  of  data  for  DBM 
purposes.  Hie  second  group  consists  of  papers  pertaining  to 
those  efforts  concerned  with  hardware  in  the  DBM  field.  Chapter 
III  contains  a description  of  the  specific  problem  this  research 
addresses  and  a methodology  for  its  solution. 
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Presented  in  the  fourth  chapter  is  an  attempt  to  derive  a 
mathematical  base  for  DBM  which  is  applicable  from  the  user's 
level  down  to  the  bit  level  of  a digital  computer.  Die  fifth 
chapter  contains  illustrations  of  modeling  DBM  by  using  set 
theory  and  the  developed  mathematical  base.  Through  this  model- 
ing, the  "functions”  of  DBM  can  be  discerned.  The  sixth  chapter 
contains  a multilevel  description  of  a proposed  hardware  implemen- 
tation of  some  of  these  "functions."  Presented  in  the  seventh 
chapter  is  a mathematical  method  that  is  utilized  for  evaluating 
the  proposed  hardware  versus  a sequential  computer  in  the  perfor- 
mance of  four  generic  jobs  of  DBM.  Finally,  Chapter  VIII's 
contents  is  concerned  with  the  conclusions  reached  from  this 
research  and  those  areas  where  future  research  should  be 
undertaken. 
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Chapter  II 

REVIEW  OF  THE  LITERATURE 


INTRODUCTION 

Hardware  in  the  field  of  Data  Base  Management  (DBM)  requires 
further  investigation.  This  can  be  elaborated  upon* by  the  follow- 
ing quote  by  Berra  [4]; 

Vast  computer  resources  are  required  for  the  managing 
of  large  data  bases.  With  hardware  costs  coming  down, 
and  software  and  personnel  costs  going  up,  it  is  im- 
portant that  one  investigate  the  application  of  asso- 
ciative devices  to  the  field  of  data  base  management 
to  ascertain  what  gains  might  be  made. 

Su  and  Lipovski  [38]  sum  up  the  situation  in  the  following  manner: 

The  hardware  limitations  of  conventional  Von  Neumann 
computers  tend  to  straight jacket  our  approach  to 
non-numerical  processing  in  large  data  bases.  Recent 
progress  made  in  memory  technology  and  integrated 
circuits  has  opened  a door  to  new  computer  and  memory 
device  design  and  offers  an  opportunity  to  the  non- 
numerical  data  processing  profession  to  examine  more 
closely  what  the  problems  are  in  data  processing  which 
are  inherent  in  the  existing  hardware,  why  they  are 
there  and  how  they  can  be  resolved  or  avoided  if  we 
are  given  the  chance  to  design  a new  machine.  In 
recent  years,  the  cost  of  hardware  is  continuously 
decreasing  whereas  the  cost  of  software  does  not 
enjoy  the  same  benefit  fr am  technological  progress. 

It  seems  to  be  perfectly  reasonable  to  implement 
those  frequently  used  software  functions  in  hardware 
to  increase  the  machine's  operating  efficiency  and, 
at  the  same  time,  to  eliminate  the  problems  introduc- 
ed by  the  mismatch  of  hardware  with  applications. 

To  accomplish  what  Su  and  Lipovski  are  suggesting,  i.e. 

”...  to  implement  those  frequently  used  software  functions  in 

hardware.  . these  functions  must  first  be  rigorously  defined. 
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Su  and  Lipovski  do  not  spell  out  what  these  "frequently  used  soft- 
ware functions"  are  for  non-numerical  processing.  The  literature 
in  the  DBM  field  also  lacks  specifics  about  them.  Since  they  are 
not  generally  known,  in  the  required  detail,  a method  for  determin- 
ing them  must  be  settled  upon  before  they  can  be  implemented  in 
hardware . 

A methodology  for  determining  these  functions  cam  be 
described  as  a modeling  approach.  The  logic  is  to  perceive  or 
derive  a mathematical  language  which  can  be  used  as  a base  to 
model  DBM.  Since,  DBM  is  concerned  with  data  as  seen  by  the 
user  communicating  with  a DBMS  down  to  the  bit  representation  of 
data  stored  within  the  computer  and  on  its  storage  devices,  it 
would  be  desirable  if  the  mathematical  language  developed  was 
useful  in  modeling  the  same  span  of  data  levels.  It  is  believed 
that,  through  the  process  of  modeling  DBM,  its  functions  or  levels 
can  be  discerned  in  the  required  detail. 

This  modeling  will  also  assist  in  determining  future  hardware. 
The  functions  discerned  by  this  modeling  can  hopefully  be  used  to 
classify  those  hardware  that  have  been  designed  and/or  constructed. 
The  by-product  of  this  classification  is  the  knowledge  of  which 
functions  have  not  been  implemented  in  hardware. 

The  first  portion  of  this  chapter  contains  a literature 
re-  lew  of  a selective  set  of  significant  papers  describing  models 
developed  for  DBM.  These  models  are  reviewed  to  determine  if  one 
or  more  of  them  can  provide  the  mathematical  language  or  model 
described  above.  The  second  portion  of  this  chapter  provides  a 
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literature  review  of  some  of  the  more  significant  papers  concern- 
ing the  hardware  in  the  DBM  field.  These  presented  hardware 
descriptions  will  provide  a baseline  for  future  developments. 

tPDELS  IN  DATA  BASE  MANAGEMENT 

Most  of  the  modeling  work  in  the  DBM  field  has  been  concern- 
ed with  either  the  lowest  level  of  data  or  general  concepts  and 
specifications  at  the  highest  level.  There  seems  to  be  a dearth 
of  information  on  modeling  that  encompasses  all  levels.  In  this 
section  a review  of  selected  papers  concerning  levels  of  data  and 
DBMSs  is  provided. 

HIGH  I£VEL  MD DELING 

Representative  reports  at  the  highest  level  are  the  Data 
Base  Task  Group  reports  [8,9]  and  the  Joint  GUIDE-SHARE  report 
[23].  As  an  example  of  this  level,  the  Data  Base  Task  Group  refjrt 
[8]  provides  a modeling  of  DBMSs.  The  modeling  defines  the  tasks 
of  a Data  Administrator,  the  different  levels  of  users,  how  to 
achieve  privacy  and  integrity,  and  of  course,  data  definition  and 
manipulation  languages.  The  relationships  amongst  data  are 
discussed  by  describing  the  logical  data  structure  of  the  sequen- 
tial, tree,  network,  and  cycle  data  structures.  Different  levels 
of  data  such  as  data- items,  arithmetic  data,  string  data,  database- 
keys,  vectors,  repeating  groups,  records,  etc.  are  also  defined. 
These  reports,  which  pertain  to  the  highest  level  of  DBMS,  do  not 
go  into  any  discussion  of  how  a DBMS  should  or  could  be  implemented. 
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They  also  do  not  provide  the  detail  necessary  to  model  DBM  as 
discussed  above.  Hence,  they. were  not  used  as  the  base  for  the 
mathematical  language  desired. 


DATA  RELATIONSHIP  MODELING 

Two  examples  of  another  level  of  modeling  in  DBM  can  be 
found  in  Child's  [7]  and  Codd's  [10]  papers  which  address  the 
relationships  of  data.  Codd . addresses  the  data  relationship 
area  by  defining  a relationship  as  a subset  of  a cartesian  product 
of  n sets.  Each  set  can  be  looked  at  as  being  attributes  in  a 
data  base.  Codd  defines  the  concepts  of  normalized  and  unnormal- 
ized sets,  the  operations  of  projection  and  Join,  composition, 
and  strong  and  weak  redundancy  relationships.  These  concepts 
and  their  operations  are  defined  and  discussed  at  a very  high 
level.  They  are  not  interfaced  with  a higher  or  lower  level  of 
implementation,  e.g.  how  the  user  would  interface  with  tt*ise 
concepts  and  how  they  could  physically  be  implemented  on  a computer. 

Child's  paper  also  addresses  the  data  relationship  area. 

His  approach  is  based  on  set  theory.  Child  states:  "...  any 
relation  can  be  expressed  in  set  theory  as  a set  of  ordered 
pairs  and  since  set  theory  provides  a wealth  of  operations  for 
dealing  with  relations,  a set-theoretic  data  structure  appears 
worth  investigating."  He  recognizes  that  order  in  relationships, 
say  <a,b>  can  be  represented  as  i.e.  <a,b>  a (a,  (a,b));  but  that 
they  become  difficult  to  handle  as  the  relationships  become  greater 
than  binary.  To  be  able  to  maintain  sets  and  set  theory  to  model 
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data  structures  and  to  keep  the  concept  of  order  necessary  for 
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describing  relationships,  Child  introduces  what  he  defines  as  a 
complex : 

"Definition  1.  Any  two  sets  A and  B form  a complex 
(A;B)  iff  (ax)  (ay)  (Xe(A,B))  (Ye{A,B))  [(VxeX)  (3ieN)  (f(x),i)eY) 
and  (VyeY)  (3JeN)  (®ceX)  ({{x),j}  a y)],"  where  N is  the  set  of 
natural  numbers. 

The  following  is  an  alternate  manner  of  defining  the  concept: 
Any  two  sets  A and  B form  a complex  (A;B)  if  and  only  if  there 
exists  a set  X and  a set  Y such  that  X is  an  element  in  the  set 
(A,B)  and  Y is  an  element  in  the  set  (A,B).  In  conjunction  with 
the  above,  for  all  elements  x contained  in  X there  exists  an 
element  i contained  in  N (the  set  of  natural  numbers),  such  that 
the  set  containing  the  set  (x)  and  the  element  i,  i.e.  ((x),i), 
is  contained  in  the  set  Y(((xJ,  i)eY).  Similarly,  for  all  elements 
y contained  in  Y there  exists  an  element  j contained  in  N and 
there  exists  an  element  x contained  in  the  set  X such  that  the 
set  containing  the  set  (x)  and  the  element  J,  {(x),j)  is  equal 


An  example  of  a complex  can  be  constructed.  Let  X = (T,A,C) 
and  Y ■ {{{A), 2),  {(C), 1},  ((T},5}))«  Then  the  complex  (X;Y)  is 
equal  to  ({C,T,A};  {{(A),2),  {(C),l),  {{T),3)))-  The  rest  of  this 
paper  deals  with  the  proving  of  theorems  and  the  defining  of 
additional  concepts  involving  complexes. 

Like  Codd's  paper,  the  level  of  Child's  work  does  not  venture 
from  that  of  set  theory.  No  attempt  is  made  to  show  how  the 
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modeling  language  applies  in  describing  for  instance,  the  implemen- 
tation process  of  building  a DBMS.  Child's  paper  concludes  by 
describing  two  example  data  bases  and  the  time  required  to  perform 
some  simple  queries  to  the  data. 

Of  these  two  papers,  Child's  concept  of  complexes  has  the 
highest  potenti&l  of  providing  the  mathematical  language  necessary 
for  the  purposes  previously  specified  and  desired.  However,  it 
was  not  used  because  additional  concepts  involving  complexes 
hindered  the  logic  needed  for  the  model  desired.  Codd's  model  is 
described  at  only  one  level  of  DBM  and  therefore  was  too  limited 
for  the  mathematical  model  required  and  hence,  was  also  not  used. 

DATA  INDEPENDENT  ACCESSING  MDDEL 

Another  level  of  modeling  DBMSs  is  the  Data  Independent 
Accessing  Model  (DIAM)  described  by  Senko,  Altman,  Astrahan  and 
Fehder  [36].  DIAM  is  composed  of  four  models.  They  are  the  Entity 
Set  Model,  the  String  Model,  the  Encoding  Model,  and  the  Physical 
Device  Level  Model. 

The  Entity  Set  Model  is  at  a very  high  level.  It  deals  with 
the  entry  mechanism  between  the  world  of  tangible  information 
external  to  the  data  base  system  and  the  DIAM.  Therefore,  infor- 
mation can  be  stored  in  a DBMS  by  describing  the  information  in 
terms  of  the  Entity  Set  Model  Name  Organization.  DIAM  will  cata- 
logue the  information  as  described  by  these  terms.  The  basic 
building  block  of  the  model  is  a triplet  of  Entity  Name  Set  Name/ 
Role  Name/Entity  Name,  where  Entity  Name  is  drawn  from  the  "Entity 
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Name"  Set.  An  example  of  this  triplet  concept  would  be  to  have 
"Part"  as  an  Entity  Name  Set  Name,  "Part  Supplied"  as  a Role  Name 
and  "Gear"  as  an  Entity  Name.  Therefore  the  triplet  would  be 
Part/Part  Supplied/ Gear . The  authors  believe  that  Entity  Names 
and  Entity  Name  Set  Names  are  more  useful  building  blocks  of 
structured  information  than  fields,  records,  and  files. 

The  next  lower  level  model  is  the  String  Model.  The  String 
Model  is  used  to  describe  how  to  traverse  within  or  through  Entity 
Sets.  The  String  Model  is  broken  down  into  A-strings,  E-strings, 
and  L- strings.  A-strings  are  defined  within  the  same  Entity  Set 
Description  by  order.  E-strings  axe  defined  between  the  same  Set 
Description  by  order.  L-strings  are  not  restricted,  but  connect 
elements  based  on  a match  between  Entity  names  for  the  same  Entity 
that  occurs  in  each  of  the  elements  (which  may  be  an  A-string,  an 
E-string,  or  another  L-string). 

The  Encoding  Model  provides  a bit  level  representation  for 
the  strings  in  the  String  Model.  The  heart  of  the  Encoding  Model 
is  the  Basic  Encoding  Unit  (BEU).  The  BEU  provides  one  basic 
format  for  encoding  all  strings  and  triplets. 

The  lowest  level  modeled  is  the  Physical  Device  Level  Model. 
This  level  models  the  placing  of  Contiguous  Data  Groups  (CDG), 
which  are  similar  to  data  records,  onto  a physical  device  (e.g. 
disk  or  drum).  A CDG  is  made  up  of  a set  of  BEUs.  As  stated  by 
the  authors  the  DIAM,  like  many  other  models  of  data  base  systems, 
".  . . does  not  cover  all  aspects  of  system  description,  but  it 
does  appear  to  describe  and  provide  defined,  detailed  interfaces 
to  a broad  range  of  components  of  data  base  systems . " 
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DIAM  provides  an  excellent  set  of  models  for  describing  data, 
their  relationships  within  a DBMS  and  some  DBM  components.  However, 
it  does  not  appear  to  provide  any  mathematical  model  with  a 
language  that  cm  easily  be  extended  to  all  aspects  of  DBM. 

ATTRIBUTE  BASED  MDDEL 

The  last  modeling  technique  is  probably  the  one  most  known 
to  the  data  base  community.  It  is  described  by  Hsiao  and  Horary 
[22].  This  is  the  Attribute  Based  Model  where  there  exists  ".  . . 
a set  A of  "attributes"  and  a set  V of  "values."  A "record"  R 
is  a subset  of  the  Cartesian  Product  A x V in  which  each  attribute 
has  one  and  only  one  value."  Thus,  R is  a set  of  ordered  pairs 
of  the  form:  (an  attribute,  its  value).  Stated  later  in  the 
paper  is  that  a file  is  a set  of  records.  Given  these  definitions 
and  other  basic  concepts  (such  as  indices,  key  words,  (A  x V) 
ordered  pairs,  and  key  word  lists),  the  authors  modeled  inverted 
files,  multilist  files,  and  index  sequential  files.  They  then 
proceed  to  define  two  functions  (a  directory  search  function  and 
a file  search  function)  and  develop  within  their  model  a Simple 
Retrieval  Algorithm  (serial  processing  of  lists)  and  a General 
Retrieval  Algorithm  (parallel  processing  of  lists).  Their  modeling, 
like  that  of  Codd  and  Childs,  stays  at  one  level  and  no  attempt 
is  made  to  link  it  with  any  upper  or  lower  levels.  Thus,  this 
modeling  was  not  further  investigated  to  determine  whether  or  not 
it  fulfilled  the  requirements  for  the  desired  mathematical  model. 
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SPECIAL  HARDWARE  AND  DATA  BASE  MANAGEMENT 

Some  researchers  have  recognized  the  importance  of  investi- 
gating special  machines  for  non-numerical  processing.  These 
alternate  machines  have  included  Associative  Memories  (AM), 
Associative  Array  Processors  (AAP)  and  special  hardware  for  inter- 
secting and  combining  sets  or  lists  of  data.  The  majority  of 
this  hardware  deals  with  the  data  in  the  data  base  or,  more 
specifically,  the  occurrences  of  files  or  relationships.  The 
descriptions  of  these  machines  provide  a baseline  for  future 
hardware  designs.  An  overview  of  other  developed  hardware  can  be 
found  in  Thurber's  paper  [39] . 

ASSOCIATIVE  MEMORIES  AML  PROCESSORS 

Some  of  the  work  of  applying  associative  devices  to  DBM  was 
directly  concerned  with  associative  memories  and  processors. 

These  research  efforts  were  conducted  by  DeFiore  and  Berra  [ 13,14], 
Moulder  [31],  and  Linde,  Gates  and  Peng  [28].  DeFiore  and  Berra 
[l1*]  developed  mathematical  equations  for  analyzing  an  inverted 
list  implementation  of  a data  base  against  an  associative  memory 
(AM)  implementation.  A basic  assumption  that  DeFiore  and  Berra 
made  was  that  the  data  base  was  totally  contained  within  the  AM. 
This  assumption  was  made  because  as  the  data  base  becomes  larger 
than  one  AM  load,  the  system  may  become  I/O  bound  and  therefore 
will  not  utilize  the  AM  hardware  to  its  uppermost  capability. 
Moulder  [31]  recognized  this  problem  and  proposed  a system  consist- 
ing of  a SIGMA  5 sequential  computer,  a STARAN  AAP  and  a parallel 
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head  per-track  disk  (PHD)  with  a capacity  of  12,288  bytes/surface. 
The  system  was  designed  such  that  two  revolutions  (39  msec/ 
revolution)  of  the  disk  allowed  a user  to  process  a simple  job 
pertaining  to  the  total  data  base.  However,  as  the  data  base 
grows  larger  than  one  surface  of  the  PHD,  the  number  of  revolutions 
increases  by  two  revolutions  per  surface. 

Linde,  Gates,  and  Peng  [28]  proposed  a system  with  more 
power  than  the  above  mentioned  work.  They  compared  a hypothetical 
machine  against  the  IBM  370/145  for  the  data  management  jobs  of 
data  retrieval,  update,  and  search.  Their  machine  differed  from 
the  others  in  that  it  was  byte-serial,  word  parallel  rather  than 
bit-serial,  word  parallel.  The  other  major  difference  in  this 
machine  is  that  unlike  the  PHD,  it  was  connected  to  a large 
(500,000  bytes)  hypothetical  random  access  data  memory  with  a 
transfer  rate  of  1.6  billion  bytes/sec. 

DISTRIBUTED  LOGIC 

The  above  research  is  representative  of  the  kind  of  work 
being  done  in  "conventional"  AMs/APs  in  DBM.  The  common  thread 
of  this  work  is  that  data  are  processed  within  the  AM/AP.  Another 
approach  being  looked  at  is  what  might  be  called  distributed  logic 
systems.  The  main  ingredient  of  this  approach  is  the  placement 
of  logic  on  the  secondary  storage  device.  Iftis  allows  for  amounts 
of  data  to  be  searched  in  place  rather  than  moving  the  data  into 
main  memory  for  searching. 

Work  by  Parker  [34]  and  Minsky  [30]  discuss  logic  on  relating 
devices.  Both  of  their  architectures  are  designed  to  search  on 





only  part  of  the  data  base.  Therefore,  they  are  sometimes  called 
"partially  associative  memories."  These  architectures  differ  in 
at  least  two  major  ways: 

1)  Parker's  structure  can  have  keys,  holes,  and  data 
on  the  same  track  while  Minsky's  structure  has 
keys  on  a separate  disk  from  the  data,  and 

2)  Parker's  structure  allows  for  variable  length 
keys,  holes,  and  data  while  Minsky's  structure 
has  fixed  size  cells  for  keys  and  data. 

In  addition  to  the  above  there  is  ongoing  work  in  "fully 
associative  memories"  that  utilize  rotating  devices.  Some  of  this 
work  is  being  performed  by  Parhami  [35],  Healy,  et  al.  [20  ] (see 
also  [6,  15,  19]],  Ozarahan,  et  al  [52,  35]  and  Lin,  et  al.  [27]. 
Their  work,  except  for  Parhami 's  is  partly  described  in  the  names 
given  to  their  architectures.  Healy 's  architecture  is  named  CASSM 
for  a Content  Addressed  Segment  Sequential  Memory,  Ozkarahan's 
architecture  is  named  RAP  for  a Relational  Associative  Processor 
and  Lin's  architecture  is  named  RARES  for  a Rotating  Associative 
Relational  Stores.  In  addition  to  rotating  devices  there  is  a 
distributed  logic  system  design  named  ECAM  [l]  for  an  Extended 
Content  Addressed  Memory. 

In  Parhami ' s structure  the  data  are  loaded  and  read  in  a 
word-serial  bit-parallel  fashion.  The  bits  of  a character  are 
loaded  on  parallel  tracks.  Since  there  is  a head  per  track,  it 
is  implied  that  data  are  read  bit-parallel  and  word-serial. 

In  the  CASSM  and  RAP  structures  the  data  are  stored  serially 
on  a track.  The  tracks  are  read  simultaneously.  Therefore,  in 
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reference  to  Parhaml's  structure,  this  can  be  referred  to  as  word 
parallel.  In  the  CASSM  structure  microcode  instructions  are  loaded 
with  the  data  on  secondary  st6rage.  The  RAP  structure  stores  and 
maintains  its  data  directory  information  on  its  secondary  storage. 
This  secondary  storage  for  the  RAP  system  is  currently  being 
changed  from  a head  per  track  device  to  charge  coupled  devices. 

Hie  RARES  system  stores  its  data  bit- serial  and  byte- parallel. 
The  domains  in  a data  relation  are  stored  serially  and  in  parallel 
on  its  rotating  device.  For  instance,  three  parallel  tracks  may 
contain  the  domain  that  is  three  bytes  long.  The  next  domain  in 
a relation  may  be  on  the  next  three  or  four  parallel  tracks 
followed  serially  by  another  domain  in  the  relation.  This 
configuration  is  called  an  "orthogonal  storage  lay  out."  It 
reportedly  [27]  "allows  a high  output  rate  of  selected  tuples  to 
be  attained  even  when  a sort  order  must  be  preserved. " 

Finally,  there  exists  a distributed  logic  system  design 
called  ECAM  It  is  being  designed  as  a special  purpose  machine 
(with  a storage  capacity  on  the  order  of  10^  bits)  to  be  attached 
to  one  or  more  host  computers,  it  is  structured  for  relational 
data  bases  and  will  have  a reportoire  of  associative  search  and 
arithmetic  operations.  The  content  addressable  memory  (currently 
considering  charge  coupled  devices)  will  contain  up  to  250,000 
associative  words  of  4,096  bits  in  length. 

SEGMENTED  ASSOCIATIVE  MEMORIES 

Another  approach  is  that  of  segmented  associative  memories 
by  love  [29],  Instead  of  placing  logic  on  a disk  or  drum,  Love 


divides  the  logic  into  10  AMs  which  share  a data  base  stored  on 
a shift  register  bulk  memory.  There  sure  two  processors  which 
manage  data  and  the  AMs.  Control  processors  manage  the  data  and 
instruction  processors  manage  the  AMs.  As  stated  by  Love,  this 
network  structure  ".  . . permits  each  associative  memory  to  be 
assigned  to  a data  transfer  channel  for  bulk  memory,  and  also 
permits  the  associative  memories  to  be  connected  together  in 
parallel  in  various  combinations."  He  goes  on  to  state  that, 
"This  capability  makes  it  possible  for  several  of  the  associative 
memories  to  operate  as  a single  large  associative  memory  when  the 
amount  of  data  requires  it,  or  to  operate  individually  in  simul- 
taneous independent  operation . " 


SUPPORT  HARDWARE 


In  all  of  the  above  discussed  literature,  one  common  thread 
has  been  that  the  hardware  is  essentially  dealing  with  the  data 
in  the, data  base  i.e.  the  occurrences  of  files  or  relationships. 
Other  designed  hardware  exists.  These  hardware  are  designed  more 
for  supporting  a DBMS  in  performing  its  functions  rather  than 
manipulating  its  data.  Three  examples  in  this  area  are  hardware 
structures  posed  by  Baum  and  Hsiao  [2],  Hollaar  [21],  and 
Singhania  and  Berra  [37, 53 - Baum  and  Hsiao  have  developed  a 
hardware  architecture  for  an  attribute  oriented  DBM  machine  whose 
primary  concern  is  to  provide  data  security.  The  hypothetical 
machine  is  made  up  of  a directory  memory,  an  intersector,  a mass 
memory,  and  a command  pre- processor.  Hie  directory  memory  contains 
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directory  information.  The  mass  memory  contains  the  data  base 
itself.  The  inter sector  is  th6  directory  memory  processor.  One 
of  its  main  functions  is  to  intersect  sets  of  data  from  the 
directory  memory.  Hie  command  pre-processor  is  that  element 
which  directs  all  the  other  components. 

Hollaar's  research  was  performed  in  the  context  of  an 
information  retrieval  system  [21].  Many  of  these  systems  employ 
inverted  files.  His  work  yielded  a hardware  structure  for 
combining  ordered  lists  which  can  be  used  on  ordered  inverted 
files . 

Singhania's  and  Berra's  work  is  related  to  a hardware 
implementation  of  a data  directory  for  a very  large  data  base. 

The  architecture  consists  of  pipelining  an  N- level  hierarchial 
directory  based  on  multiple  keys.  Each  level  of  the  hierarchy 
is  processed  by  an  associative  memory.  The  architecture  is 
designed  to  provide  concurrent  processing,  i.e.  processing  within 
an  AM  and  within  the  pipeline. 

CONCLUSION 

It  is  not  obvious  that  any  one  of  the  above  hardware  devices 
will  solve  all  the  problems  that  will  occur  in  develop- 
ing a DBMS  for  a large  integrated  data  base.  This 
feeling  was  also  reflected  by  Berra  [4],  "...  it  appears 

at  the  present  time  that  associative  memories  and  processors 
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have  a potential  for  reducing  some  of  the  pressing  problems  in  the 
field  but  they  are  by  no  means  the  final  answer;  only  a step  on 
the  way  to  more  sophisticated  devices."  He  goes  on  and  states 
that, 

. . . there  are  thousands  of  data  base  problems  in 
existence  today  that  would  support  the  development 
of  computers  strictly  for  the  solution  of  these 
problems.  Imagine  the  vast  amounts  of  data  the 
various  government  agencies  must  manage,  let  alone 
all  of  the  industrial  organizations  and  businesses 
that  are  aspiring  to  integrated  corporate  data 
bases.  Imagine  also  the  vast  amount  of  computer 
resources  that  are  wasted  in  processing  largely 
non- sequential  data  on  sequential  computers. 

Similar  words  by  Su  and  Lipovski  were  cited  earlier  in  this  chapter. 

However,  they  do  not  say  how  one  determines  which  software  functions, 

if  implemented  in  hardware,  will  increase  a machine's  operating 

efficiency  or  how  one  determines  how  to  find  these  elusive  software 

functions.  Berra  [4]  sums  it  up  when  discussing  the  future,  "... 

It  seems  clear  to  this  author  that  for  the  next  few  years  the 

associative  device  will  remain  essentially  a peripheral  to  a 

sequential  computer.  One  reason  is  that  we  just  don’t  know  enough 

about  the  generic  functions  that  must  be  performed  in  data  base 

management  and  therefore  can't  real ly  define  what  we  need  from  the 

hardware."  Therefore,  the  problem  is  to  first  define  these  generic 

functions  and  then  implement  them  in  hardware.  Once  this  is 

completed  a DBM  computer  will  become  a reality. 


Chapter  III 

PROBUM  DEFINITION  AND  METHDDOIOGY 


INTRODUCTION 

In  the  previous  chapter  a review  was  presented  of  a selec- 
tive set  of  significant  papers  describing  the  modeling  of  data 
for  DBM  and  those  architectures  being  built  and  investigated  for 
the  DBM  field.  The  papers  discussing  the  modeling  of  the  data 
were  presented  to  determine  if  any  of  their  approaches  could  be 
used  in  defining  the  generic  functions  of  DBM.  The  contents  of 
the  remaining  papers  provided  a baseline  for  the  functions  of 
DBM  that  have  been  addressed  by  their  implementation  in  hardware . 
It  was  also  pointed  out  that  if  a DBM  computer  is  to  be  a reality, 
then  the  generic  functions  of  DBM  must  first  be  defined. 

Provided  in  this  chapter  is  a definition  of  the  specific 
problem  this  research  addresses  and  a methodology  for  its 
solution.  The  methodology  is  presented  as  an  overview  describing 
the  four  phases  of  the  solution  procedure.  Following  this  over- 
view, each  phase  is  discussed  at  a more  detailed  level. 


FROBIEM  DEFINITION 


Succinctly  stated,  the  problem  addressed  in  this  research 
is  to  first  develop  a mathematical  modeling  concept  that  can  be 
utilized  to  model  DBMSs  from  the  user  level  down  to  the  bit  level 
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and  secondly  to  utilize  this  mathematical  base  to  define  in  detail 
some  of  the  functions  that  must  be  performed  in  DBM.  Utilizing 
the  definition  of  these  functions  the  next  step  is  to  consider  the 
implementation  of  them  in  hardware.  Given  a proposed  hardware 
implementation,  the  final  step  in  this  research  is  to  evaluate  the 
hardware  and  compare  it  with  more  conventional  approaches. 


METHODOLOGY  OF  SOLUTION 

The  method  used  to  approach  the  above  problem  can  be  divided 
into  four  phases.  The  development  of  a mathematical  language  for 
possible  use  as  a base  for  modeling  DBM  is  dealt  with  in  the  first 
phase.  It  is  required  that  the  language  be  applicable  from  the 
user's  view  of  the  data  down  to  the  bit  representation  of  the  data 
occurrences.  Phase  2 encompasses  the  application  of  this  mathe- 
matical language . This  modeling  procedure  yields  the  levels  of 
DBM  and  some  of  its  functions.  In  the  third  phase  the  levels  or 
functions  that  have  been  considered  for  implementation  in  a hard- 
ware design  are  compared  with  those  discerned  from  the  second 
phase.  A hardware  design  is  developed  for  a subset  of  those  levels 
and  functions.  The  design  is  evaluated  in  the  fourth  phase  by 
comparing  the  hardware  time  with  the  time  required  for  a convention- 
al sequential  computer  to  perform  a number  of  functions  of  DBM. 


PHASE  1:  A MATHEMATICAL  BASE  FOR  DATA  BASE  MANAGEMENT 

In  the  previous  chapter,  the  literature  survey  of  the  signi- 
ficant papers  describing  models  developed  for  DBM  indicated  that 
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none  of  them  provided  the  desired  capability.  However,  Child's 
model,  applying  the  concept  of  order  to  sets,  came  the  closest  and 
therefore  was  pursued  in  the  expectation  that  a mathematical  base 
for  modeling  DBM  could  be  obtained. 

The  properties  of  this  base  are  similar  to  the  properties 
of  set  theory.  ,For  example,  the  properties  of  containment,  union, 
and  intersection  are  developed.  In  the  process  of  developing  these 
properties,  new  operators  are  required  and  hence  are  defined. 

The  relationships  between  sets  and  sets  with  order  are 
developed  to  finalize  the  mathematical  language.  This  entails 
the  derivation  of  how  sets  with  order  are  defined  from  sets  and 
their  Inverse  relationship.  The  next  level^of  relationships  derived 
is  concerned  with  characteristic  functions  for  sets  and  for  sets 
with  order.  Characteristic  functions  are  investigated  because 
they  provide  a convenient  method  of  implementing  this  mathematical 
language  in  hardware. 

PHASE  2;  DATA  BASE  MANAGEMENT  AND  SETS 

Ihe  mathematical  language  is  used  to  model  DBM  levels  and 
functions.  DBM  levels  are  divided  into  four  parts  and  can  be 
considered  as  being  static.  These  four  parts  are: 

1)  the  user  computer  interface; 

2)  the  attribute  and  file  or  relationship  (F/R)  names; 

3)  the  modifiers  of  the  attribute  and  F/R  names;  and 

U)  the  occurrences  of  the  attributes  and  F/Rs. 

DBM  functions  are  not  divided  into  any  specific  parts.  They  are 
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dynamic  and  operate  on  the  above  levels  to  perform  the  jobs 
requested  by  a user  of  a DBMS. 

PHASE  5 : HARDWARE 

The  design  procedure  is  pursued  in  four  non-dis joint  steps. 

The  first  step  interfaces  the  hardware  with  a sequential  computer. 
Next,  a design  of  the  hardware  at  a more  detailed  level  consider- 
ing its. control,  data  storage,  and  data  transfer  is  performed. 

Hie  hardware  at  the  logic  gate  and  flip  flop  levels  is  designed 
in  the  third  step.  The  final  step  insures  that  the  design  in  each 
of  the  above  three  steps  will  perform  the  desired  functions. 

PHASE  HARDWARE  EVALUATION 

The  method  used  to  accomplish  the  hardware  design  evaluation 
can  be  divided  into  five  steps.  The  development  of  the  timing 
equations  for  sub- functions  is  accomplished  in  the  first  step. 

These  sub- functions  cm  be  thought  of  as  "microfunctions"  that  can 
be  put  together  to  form  a DBM  function.  Defined  next  are  a number 
of  job  types  that  can  be  submitted  by  a user  to  a DBMS.  Together, 
these  job  types  entail  all  functions  that  the  hardware  would  have 
to  perform  in  a "real"  environment.  Timing  equations  are  then 
developed  for  each  of  these  job  types  by  summing  the  timing  equations 
for  the  proper  microfunctions.  In  step  three  the  data  structure 
Mi  software  techniques  necessary  to  perform  the  same  functions 
implemented  on  a sequential  computer  are  designed.  Development 
of  the  timing  equations  for  each  of  the  above  job  types,  on  the 


sequential  computer  is  performed  next.  Hie  fifth  step  compares 
the  timings  for  each  of  tho  Job  types  performed  on  the  hardware 
and  the  sequential  computer. 

SUMMARY 

This  chapter  contained  the  definition  of  the  problem  this 
t research  addresses,  and  the  methodology  utilised  in  solving  this 

problem.  The  methodology  was  described  as  a four-phase  overview 
with  each  phase  being  discussed  at  a more  detailed  level.  Hie 
next  four  chapters  contain  the  details  of  each  of  the  four  phases 
respectively. 


2U 


I 


, • • tejs&j  1 

Chapter  IV 

A MATHEMATICAL  BASE  FOR  DATA  BASE  MANAGEMENT 

/ 

INTRODUCTION 

« 

The  objective  of  this  chapter  is  to  provide  a mathematical 
base  that  can  be  used  to  model  data  base  management'  and  its 
implementation  in  hardware.  The  chapter  centers  around  ten 
properties  of  sets.  These  properties  are  defined  early  in  the 
chapter.  This  is  followed  by  the  development  of  a concept  called 
Data  Processing  Sets  which  are  sets  with  order.  Then  ten 
similar  properties  are  defined  for  Data  Processing  Sets.  The 
next  portion  of  the  chapter  involves  the  relationships  between 
sets  and  Data  Processing  Sets.  This  provides  a background  on 
the  similarities  and  differences  between  the  two  concepts. 

Data  Processing  Sets  provide  a mathematical  base  for  the 
modeling  of  DBM  at  the  user's  level.  To  extend  this  mathematical 
base  such  that  it  can  be  utilized  for  modeling  DBM  at  a digital 
level  and/or  to  design  computer  hardware  requires  the  introduction 

* j 

of  Data  processing  Characteristic  8ets.  The  final  portion  of 
this  chapter  defines  the  above  ten  properties,  with  examples, 
for  Data  Processing  Characteristic  Sets. 

j 

A REVIEW  OF  SETS  AND  THEIR  PROPERTIES 

A set  may  be  thought  of  as  a collection  of  elements.  They 
will  usually  be  denoted  by  one  or  more  upper  case  letters. 
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However,  a set's  elements  will  usually  be  Indicated  by  lover  case 
letters  and  may  be  enclosed  by  { ).  An  example  of  a set  and  its 

elements  is; 

X ■ (Xp  Xg,  •••,  *jj)» 

where  X Is  a set  and  x^  Xg,  xn  are  the  elements  contained 

in  X. 

Some  of  the  various  properties  of  sets  that  are  used  through- 
out this  research  are  presented  below  with  the  aid  of  the  defined 
symbols  in  Table  IV- 1. 

I Subset  (or  Containment) 

XcY«— +xeX-»xeY 
‘ 

II  Proper  Subset 

XcYw(XcY)  A (3(x  e Y)  a x i X) 


III  Set  Equality 

X a Y « — » (X  c Y)  A (Y  c X) 
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SYMBOL 
X c Y 
X c Y 
X A Y 
X V Y 
X -*  Y 
X « — ► Y 
X € X 
XJ  € X 
X U Y 

x n y 

X - Y 
V 

iff 

a 

3 

<t> 

x / X 
xj  i X 
a^(eZ)  = 0 

J 


Table  IV-1 
Set  Theory  Notation 

MEANING 

X is  a Bubaet  of  Y 
X is  a proper  subset  of  Y 
X AND  Y 

X OR  Y (inclusive) 

X Implies  Y 

(X  Implies  Y)  AND  (Y  Implies  X) 

x is  contained  in  X 

x*^  is  contained  in  the  DP  set  X 

X Union  Y 

X Intersect  Y 

X And  Not  Y (Difference) 

For  All 

If  and  only  if 
There  exists 
Such  that 
Null  Set 

x is  not  contained  in  X 

x?  is  not  contained  in  the  DP  set  X 

a^,  an  element  of  the  set  Z is  equal  to  zero. 
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of  the  original  set.  That  is,  for  every  set  X,  the 
set  of  all  subsets  of  X,  called  P{X),  is  the  power 
set  of  X.  Symbolically, 

P(X}  - {Y:  Y c X). 

For  example,  if  X - {x^,  Xg}, 

then  P{X)  - {<t>,  {xLJ,  {Xg},  (xL,  Xg}}. 

Power  of  a Set 

The  power  of  a set  is  the  number  of  unique  elements 
contained  in  the  set.  For  example,  the  power  of  sets 
X - {xx,  Xg}  and  Y ■ (xL,  Xg,  Xg)  is  two. 

Power  of  a Power  Set 

If  the  power  of  a set  X is  T then  the  power  of  a power 
set  X (P{X))  is  2T.  For  example,  if  X = {x1,  x^  ],  then 
the  power  of  X is  two  and  the  power  of  the  power  set  of 
X (P{X)  - {<!>,  (x^,  {xg},  {xx,  Xg}))  is  22  = 4.  This 
can  be  shown  by  noting  that  the  power  set  is  the  set 
of  all  the  subsets  of  the  original  set  plus,  of  course, 
the  null  set  (which  is  a subset  of  every  set).  There- 
fore, if  the  power  of  the  original  set  is  T,  then  the 
number  of  elements  in  its  power  set  is  the  sum  of  the 
combinations  of  T elements  taken  one  at  a time,  two  at 
a time,  ...,  T at  a time,  plus  the  null  set,  i.e. 


X Characteristic  Function  of  a Set 


The  domain  of  a characteristic  function  (c.f.)  for  a 
set  X is  the  power  set  of  X.  Its  range  is  the  set  , 
{0,1}.  The  value  of  c.f.  of  some  element  of  X related 
to  some  element  of  P{X}  is  dependent  on  whether  or 
not  that  element  of  X is  contained  in  the  element  of 
P(X).  This  can  be  shown  as; 


c.f.;  X -*  {0,1}  where  c.f«x  (Xj ) 


| 1 if  x3  . X, 
k 0 if  XJ  ^ xi  ' 


X^  c x and  Xj  e X. 

Consider  an  example:  X - {xq,  xp  Xg}  and  XI  c x 
where  X^  ■ (;cq,  x^}.  Then  the  c.f.,  with  respect  to 
X p is  equal  to  c.f>x  (xQ)  = 1,  c.f>x  (x^  • 1,  and 
c‘f’x  (*2^  " ®°te  fro®  this  example  that  the 

^ i 

inverse  of  this  characteristic  function  (c.f.  ) is  not 

a function  since  it  is  not  one-to-one. 


DATA  PROCESSING  SETS 

If  a mathematical  base  is  going  to  model  DBM  it  should  at 
least  be  capable  of  modeling  the  Input/Output  to  a DBMS.  This 
was  a major  driving  factor  in  the  development  of  Data  ft'ocessin'g 
(DP)  Sets.  Consider  trying  to  model  the  marks  made  across  this 
page  as  a set.  For  each  advancement  of  the  typewriter  carriage 
the  typewriter  can  place  only  one  character  on  the  page,  but 
each  character  may  appear  in  more  than  one  position  on  a line. 
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If  two  words  or  two  lines  of  print  are  identical,  then  not  only 
must  the  characters  be  the  same  but  their  respective  positions 
must  also  be  the  same.  Ibis  mode  of  thinking  has  led  to  the  basis 
of  the  proposed  mathematical  base. 

Imagine  trying  to  model  the  lines  of  type  by  using  set 
theory.  If  the  following  two  phrases 

1)  "she  is  smarter  than  him" 
and 

2)  "he  is  smarter  than  her" 

are  modeled  as  "sets,"  (ignoring  blanks)  then  they  become  equi- 
valent  by  the  Axiom  of  Extent  [4o] . That  is, 

(s,h,e, i, s,s,m,a,r, t,e,r,  t,h,  a,n,h,  i,m)  * (s,h,e,i,m,a,r,t,n), 

{h,  e,i,  s,s,m,  a,r,  t,  e,r,t,h,  a,n,h,  e,r ) - (h,  e,  i,  s,m,  a,r,  t,n)  and 
(h,e,i,s,m,a,r,  t,n)  > (s,h,e,  i,m,a,r, t,n). 

However  these  "sets"  or  relations  in  the  real  world  are  not  equal 
because: 

1)  there  are  a different  number  of  occurrences  of  elements 
(e.g.  s occurs  three  times  in  the  first  phrase  and  twice  in  the 
second  phrase);  and 

2)  the  order  of  the  elements  in  the  two  phrases  is  not  the 

same. 

"Sets"  which  have  theBe  properties,  plus  the  additional  property 
that  two  or  more  elements  cannot  occupy  the  same  position  in  a 
set  shall  be  called  DP  sets.  An  element  of  a DP  set  shall  have 
an  integer  superscript  greater  than  zero.  The  value  of  the 
superscript  designates  the  order  (or  position)  of  the  element  in 
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the  DP  set.  An  alternate  notation  will  sometimes  be  used  where 
the  element's  superscript  value  will  be  implied,  by  the  element's 
position  in  the  DP  set,  from  one  to  the  number  of  elements  in  the 
DP  set.  For  example,  let  X and  Y be  DP  sets  where 

X ■ ( Xg,  x^,  Xj,  •••,  Xjj}) 


Y - (a,b). 

The  DP  set  X is  interpreted  as  having  element  Xg  being  the  first 
element  in  X,  x^  being  the  third  element  in  X,  xQ  being  the  k-th 
element  in  X,  etc.  The  DP  set  Y is  interpreted  as  having  a as 
its  first  element  and  b as  its  second  element. 

A DP  set  is  a set  of  elements  where  each  element  has  two 
parameters.  One  parameter  is  the  "value"  of  the  element  (e.g. 
xl>  a»  y2»  z>  etc' ) 811(1  the  other  parameter  is  the  position  the 
element  occupies  in  the  DP  set.  In  a DP  set  two  or  more  elements 
may  have  the  same  value.  Consider  the  following  unequal  DP  sets, 

{d1,  a2}  / [a1,  d2,  d3}. 

P z 12  5 

The  d and  dr  elements  of  {a  , dr,  dr  } have  the  same  value  d. 
However,  two  or  more  elements  of  a DP  set  cannot  occupy  the  same 
position.  For  example,  X = {a  , d , d ) and  Y a {d  , a } are  not 
DP  sets.  Therefore,  an  element  is  contained  in  a DP  set  if 
and  only  if  the  j-th  position  of  the  DP  set  has  an  element  with 
value  x and  j is  an  integer  greater  than  zero.  Stated  in  another 
way,  let  N ■ {1,  2,  ...),  be  the  set  of  positive  integers,  and 
if  is  an  element  of  a DP  set  X,  then  this  implies  if  there  is 


another  element  In  X,  say  y,  which  has  a superscript  J then 
x ■ y. 
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Given  the  above,  a formal  definition  of  a DP  set  can  be 
stated.  X is  a DP  set  provided  that; 

1)  3 xJ  e X; 

2)  j e Jj  where  N - {1,  2,  ...  };  and 

3 ) if  y^  c X then  x ■ y. 


PROPERTIES  AND  OPERATORS  OF  DATA  PROCESSING  SETS 

To  utilize  DP  sets  in  the  modeling  of  DBM,  some  basic 
properties  (similar  to  those  contained  in  set  theory)  and 
operators  must  be  defined.  Referring  again  to  the  symbols  in 
Table  IV- 1,  consider  the  following  properties  and  operators  for 
DP  sets. 

XI  T>P  Set  Subset  (or  Containment) 

X c Y «—»  (x1  € X -+  x1  € Y) 

XII  DP  Set  Proper  Subset 

X c Y «->  (X  c Y)  A (3(x1  € Y)  3 x1  / X) 

XIII  DP  Set  Equality 

X = Y < — > (X  c Y)  A(ycx) 

XIV  DP  Set  Intersection 

[X  0 Y « Z]  < — ♦ [(xl  € Z ) *“■+  (x1  e X)  A (x1  c Y)] 

In  order  to  define  the  union  and  difference  properties  of 
DP  sets,  two  new  operators  must  first  be  defined.  These  operators 
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axe  called  Pull  (P^)  and  Push  (P1).  Pj(X)  la  defined  as  an 
operator  which  subtracts  the  integer  i from  each  element's 
superscript  in  X.  P*(X)  is  defined  as  an  operator  which  adds 
the  integer  i to  each  element's  superscript  in  X.  However,  note 
that  if  the  value  of  an  element's  superscript  is  equal  to  or 
less  than  zero,  then  the  element  is  not  contained  in  X;  since 
the  superscript  must  be  contained  in  N.  Consider  the  following 


example . 


Y ■ fa8,  br,  at,  ...,  uq) 


Pj(I)  - b"r,  .t+r,  ....  u««) 

I$<Y)  - L“r,  «‘-r,  ....  ui‘r). 


The  union  and  difference  properties  can  now  be  defined. 


DP  Set  Union 

[X  U Y ■ Z]  < — ► [(xi€  Z)  «-♦  (x1  e X)  V (x1  e Y)] 

An  example  of  using  the  push  operator  and  the  union 

property  would  be  in  expressing  the  DP  set  fa,  d,  d]  as 
1 i p 

the  union  of  fa  ) and  fd  , d ) i.e. 

fa1)  U Pgfd1,  d2}  » fa1,  d2,  d3). 


DP  Set  Difference 


[X  - Y ■ Z]  «—* t(xi6  Z)  ^ (x1  e X)  A (x1  / Y)] 

An  example  of  the  pull  operator  and  the  difference 
property  would  be  in  expressing  the  DP  set  fd2)  as  the 
difference  of  fd5,  d8)  and  fd5,  d7,  d9,  d10),  i.e. 
P^fd5,d8}  - fd5, d7, d9, d10 ) - fd2). 
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XVII  DP  Set  Power  Set 

A power  set  of  a DP  set  la  identical  to  property  VII  for 
lets,  i.e.  if  X and  Y are  DP  sets  then 
P{X)  - (Y:  Y c x} 

XVIII  Power  of  a DP  Set 

The  power  of  a DP  set  is  the  number  of  unique  elements 
contained  in  the  set.  For  example,  the  power  of  DP  sets 
X - (xj,  Xg)  and  Y ■ (xj,  x^ ) is  2 and  J respectively. 
(See  the  example  given  for  property  VIII.) 

XIX  Power  of  a DP  8et  Power  Set 

Ihe  power  of  a DP  sot  power  set  is  identical  to  property 

IX  for  sets,  i.e.  if  the  power  of  a DP  set  is  T then  the 
power  of  its  DP  set  power  set  is  2*. 

XX  Characteristic  Function  of  a DP  Set 

The  domain  of  a characteristic  function  (s.f. ) for  a DP  set 

X is  the  power  set  of  X.  Its  range  is  the  set  (0,1).  me 
value  of  c.f.  of  some  element  of  X related  to  some  element 
of  P(X)  is  dependent  on  whether  or  not  that  element  of  X 
is  contained  in  the  element  of  P(X). 

/ 
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c.f.;  X -»  (0,1)  where  c.f.„  (x*)  ■ 


if  x;  c 


if  X 


3 * \> 


X1  C x and  Xj  c X. 
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DATA  PROCESSING  SETS  AND  SETS 

The  contents  of  the  next  chapter  are  concerned  with  the  model- 
ing of  DBM.  This  modeling  utilizes  the  above  properties  and 
operators  in  a more  realistic  manner  than  the  above  examples. 

Along  with  DP  sets,  the  modeling  also  employs  sets  and  their 
properties.  To  provide  a better  understanding  of  this  modeling, 
the  relationships  between  sets  and  DP  sets  are  discussed  below. 

The  concept  of  DP  sets  is  actually  an  extension  of  set 
theory  which  adds  the  parameter  of  position  to  the  elements  in  a 
set.  By  removing  this  extension  there  exists  a one-to-one 
relationship  between  a DP  set  and  a set.  That  is,  given  any  DP 
set  there  exists  a functioh  (f)  that  maps  this  DP  set  to  one  and 
only  one  set.  The  domain  of  f is  a DP  set  and  the  range  of  f is 
a set.  (Consider  f as  a function  that  removes  the  position 
parameter  imposed  on  the  elements  of  a DP  set.) 

Let  A^  be  a DP  set  and  A a set  where  A^  is  the  domain  of  f 
and  A is  the  range  of  f.  Then  f is  defined  in  such  a way  that  if 
f(x*)  ■ y then  x1  « A^,  y c A,  and  x ■ y. 

For  every  DP  set  A^,  the  function  f maps  its  elements  into 
one  and  only  one  set  A.  Let  this  set  A be  called  the  Basis  set 
for  the  DP  set  A^.  Therefore,  a Basis  set  can  be  defined  as: 

A is  a Basis  set  of  a DP  set  A^  if  and  only  if  for  every 
(x^  t A1)aa(y  c A)»(x  ■ y)  and  if  (y  « A)s  one  or  more  (i  < N)  » 

(x1  c A1)  and  x ■ y. 

It  should  be  noted  that  although  there  is  only  one  Basis 
set  for  each  DP  set,  more  than  one  DP  set  may  have  the  same 
Basis  set. 
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To  Illustrate  the  above  concepts,  consider  the  following 
examples ; 

Example  lj  Let  A^  • (a1,  d^,  d2  J 
then  fCa1)  ■ a, 
f(d3)  . d, 
and  f(d2)  - d. 

Example  2;  Let  ^ - {a1,  d2,  d5 ), 

then  the  Basis  set  of  is  A - (a, d}. 

Example  3:  Let  A1  ■ (a1,  d2,  d^ } and  Ag  ■ {d1,  a2} 

then  the  Basis  set  for  A^  is  {a, d)  and  the  Basis 
set  for  Ag  is  (a, d).  Note  that  (a,d, c)  is  not  a Basis 
set  for  either  A^  or  Ag  since  there  does  not  exist  a 
J c N such  that  c^  e A^  or  c^  t Ag. 

DEFINING.  PROCEDURE  FOR  DATA  PROCESSING  SETS 

The  above  concepts  of  power  sets  and  the  power  of  a set  are 
needed  to  help  describe  a procedure  for  defining  DP  sets.  This 
procedure  can  best  be  expressed  by  first  recognizing  the  existence 
of  a non-empty  set  A.  Then  define  a subset  of  A (say  X)  as  a 
Basis  set  where  X is  an  element  of  the  power  set  of  A(X  c P(A } ) . 
Given  the  elements  in  X,  an  ordering  or  permutation  with  replace- 
ment X can  be  defined.  (Note  that  X is  not  defined  as  a set. ) 

The  last  step  is  to  define  on  X a DP  set  Y.  An  example  to  illus- 
trate this  procedure  is  now  given.  Consider  the  keys  on  a type- 
writer. Let  the  available  symbols  be  defined  as  a set 
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Alphanumeric  (AN)  - (A,  B,  ...,  0,  1,  ...,  1,  ...,  ]}.  Let  X be 
a Basie  set  where  X e P{AN)  - (C,  P,  A,  R,  0),  X - CAPRARO  and 
therefore  the  DP  set  Y - (C1,  P5,  0T,  r\  R6,  A2,  A5). 

It  can  be  seen  from  the  information  provided  thus  far  that 
given  a non-null  Baeie  set  X,  an  infinite  number  of  DP  sets  can 
be  defined.  This  is  true  since  there  ere  no  restrictions  on  the 
power  of  the  Basis  set,  the  power  of  the  DP  set  and  the  limit  of 
the  values  of  the  element's  superscripts  in  the  DP  set.  Pbr  the 
modeling  purposes  to  be  discussed  in  the  next  chapter,  this 
infinite  upper  bound  is  generally  unrealistic.  This  can  be 
rectified  by  placing  a finite  limit  k on  the  power  of  the  Basis 
set,  a finite  limit  T on  the  power  of  the  DP  sets,  and  restrict- 
ing the  DP  sets  to  be  what  is  called  a Normal  DP  set.  A Normal 
DP  set  is  a DP  set  whose  minimum  superscript  value  is  one  and 
whose  maximum  superscript  value  Is  equal  to  the  power  of  the  DP 
set.  Tot  example  (a1,  b2)  is  a Normal  DP  set  but  (a2,  b3)  is  not 

a Normal  DP  set. 

/ 

Given  the  above  restrictions,  if  a Basis  set  X has  a finite 
power  of  k and  the  Normal  DP  sets  defined  with  X as  a Basis  set 
have  a power  equal  to  or  less  than  T (where  T is  finite),  then 
the  maximum  number  of  Normal  DP  sets  that  can  be  defined  is 
T 

z*- 

1-1 

For  example,  let  X ■ (A)  and  T - 2.  The  maximum  number  of  Normal 
DP  sets  that  can  be  defined  on  the  Basis  set  X is 
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1 11 


These  Normal  DP  sets  are  (A)  and  {A, A).  The  expression  k1  is 
obtained  by  viewing  the  situation  as  the  placing  of  k things  in 
i boxes  with  replacement,  i.e.  there  are  k ways  of  filling  box  1, 
k ways  of  filling  box  2,  k ways  of  filling  box  i.  Therefore 

there  are  k*  ways  of  filling  i boxes  with  k things,  with 
replacement . 

Extending  this  concept  further,  a super  DP  set  of  a Basis 
set  X can  be  defined  as; 

A super  DP  set  of  a Basis  set  X is  that  set  containing  the 

null  set  and  all  the  possible  Normal  DP  sets  that  can  be  defined 

on  X.  For  the  examples  above  with  k and  T finite  1 and  2 

respectively  and  the  DP  sets  normal  the  super  DP  set  of  X is 
1 12 

{$>,  {A  },  (A  , A }}.  The  power  of  a super  DP  set  can  be 


determined  by 


I*1 


For  this  example  the  power  is 


i1 . j. 


DATA  PROCESSING  CHARACTERISTIC  SETS 

One  of  the  major  thrusts  of  this  research  is  to  show  that 

• .v 

DP  sets  and  sets  can  be  used  to  model  DBM  from  the  user's  level 
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of  data  to  the  bit  representation  of  data.  So  far,  the  mathe- 
matical bases  for  modeling  the  user's  level  of  data  has  been 
presented.  To  extend  this  mathematical  base  to  the  bit  level 
requires  an  extension  to  the  property  called  the  characteristic 
function  of  a DP  set  (property  XX). 

An  extended  characteristic  function  (e.c.f.)  can  be  defined 
for  DP  sets  whose  inverse  is  also  a function.  Following  the 
notation  above  let  1)  A be  a Basis  set  whose  power  is  y,  2)  A 
be  defined  as  one  of  the  yl  possible  permutations  of  the 
elements  of  A,  3)  Aq  be  defined  as  a Normal  DP  set  defined  on  A 
and  4)  a Normal  DP  set  whose  elements  have  values  of  zeros  and/or 
ones  be  defined  as  a Data  Processing  Characteristic  Set  (DPCS). 
Now  an  e>c.f.  can  be  defined  as: 

r ^ if  aj  6 Ai 

e.c.f:  A -» DPCS  where  e.c.f..  (a*)  = < 

o y 

y if*yv 

where  A^  c A,  (Power  of  {DPCS})  =»  (power  of  A)  = y,  and  a*  e Aq. 
The  domain  of  this  e.c.f.  is  the  power  set  of  A.  Its  range  is  a 
set  of  2y  elements  (See  properties  IX  and  XIX)  each  being  a DPCS. 
For  an  example  of  the  above  definition  of  an  e.c.f.,  let: 

A * { ®2  > J Ag  * { aQ, a^ ) A^  = { a^ } 

A - aQ,  ar  ag  > {a^}  A?  = 

12  5 

Aq  ■ {ao,  ^ a ^ Ag  a 

\ - ($)  ^ » {aQJ 
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• . P(A}  a (A^,  Ag,  A^,  A^,  A^,  Ag,  Ay , Ag } 


e,C'f-A1(Ao) 

e'C*f,A2(Ao) 

e.C.f.A5(A0) 

e-c-f*Au(Ao) 

e‘C’f‘A^(Ao) 

'•c-f-*6(Ao> 

e.c.f.A^(Ao) 

e-C-f-A8(Ao) 


= DPCS  for  A1 

- DPCS  for  Ag 
=.  DPCS  for  A5 
= DPCS  for  A^ 
= DPCS  for  A^ 

- DPCS  for  Ag 
a DPCS  for  A^ 
a DPCS  for  Ag 


(0,0,0)  = Bx 

(1.1.0)  a Bg 
(1,0,1)  = B5 

(0,1,1)  a BU 

(1,0,0)  = B5 
(0,1,0)  a Bg 
(0,0,1)  a By 

(1.1.1)  - Bq 


Note  that  since  the  e.c.f.  on  A fixes  the  order  of  the 

o 

elements  of  A by  definition,  then  the  e.c.f.  is  one-to-one  and 
onto  the  set  of  DPCSs.  Therefore,  the  e.c.f.  inverse  (e.c.f.-1) 
exists  and  is  also  one-to-one  and  onto  the  power  set  of  A.  Por 
the  example  above; 


e.c.f.-1{0,0,0)  a 
e.c.f."1(l,l,0)  » Ag 
e.c.f .-1{1, 0,1)  * Aj 
e.c.f .-1{0, 1, 1)  a A^ 
e.c.f.-1(l,0,0)  - A5 


uo 


-1 


e.c.f.  (0,1,0)  = Ag 


-1 


e.c.f.  (0,0,1)  = Ay 


-1 


e.c.f.  (1, 1, 1)  = Ag- 


Pictorically,  the  e.c.f.,  its  inverse  (e.c.f.-1),  and  Aq  can  be 
displayed  in  Figure  IV- 1. 

The  concept  of  DP  Characteristic  sets  (DPCSs)  provides  a 
basis  for  the  modeling  of  non-numerical  processes  performed  by 
digital  computers.  This  occurs  because  they  can  represent  non- 
binary element  sets  as  binary  element  sets  and  therefore  are 
compatible  with  digital  processing.  Consider  the  following  set 
properties  defined  for  DPCSs,  remembering  that  each  DPCS  can  be 
mapped  back  to  a non-binary  element  DP  set. 

Let  0 and  0 be  two  DPCSs  defined  on  the  same  DP  set  A . 
o m o 

They  therefore  have  the  same  power  (n)  and  can  be  defined  as 
follows : 


where 


(a^,  a^_,  • . • , a^ ) 


Vi  a1  = 0 or  1 and  a1  e 0q 


and 


i«l 


°m  * *al>  V *•*»  ak^ 


where 


V 

ial 


i . . i _ 

, a.  = 0 or  1 and  a.  e 0 . 
i J J ® 


k2 

The  properties  of  DFCSs  can  now  he  expressed  using  the  notation 
in  Table  IV- 1. 

XXI  DPCS  Subset 

n 

0o  c 0n  iff  Vi  if  Sj ( €0q ) « 0 then  aj(eOn)  - 0 or  1 
i-1 

and  if  aj(€0Q)  » 1 then  a^eO^)  - 1,  where  ei^ (eZ)  - 0 
implies  that  a*  is  an  element  of  the  set  Z and  it  is 
equal  to  zero. 


XXII 


XXIV 


DPCS  Proper  Subsgt 

°o  c °m  iff  \/it(if  aJ(e0o)  - 0 then  aj(eOm)  - 0 or  1 


i-1 


and  if  aj(e0o)  - 1 then  Oj(eOm)  = 1)  and  a at  least  one 


i a aj(e0o)  - 0 and  a.^ (eOm)  - 1] 


XXIII  DPCS  Equality 


0-0  iff  0 c 0 and  0 c 0 
o m o — m m — o 


lf  (o„  - 0.)  —Vi  if  aj(eOQ)  - 0 then  aj(€0m)  - 0 
i-1 

and  if  a^(€0_)  - 1 then  a*(€0  ) - 1. 
jo  j m 


DPCS  Intersection 

(°o  n °m)  ■ ^ere  power  of  - n 


n 


“d  Vi<*H» 


i-1 


1 if  aJ(«°0)  * 1 ^d.aj(e0a)  * 1 


0 otherwise. 


I i 


i 

H 


XXV  DFCS  Union 


(O  U 0 ) = 0.  where  power  of  0.  = a 

O ul  a K 


and  \/i  (aJ(€°k)) 


0 if  a*(e()  ) =»  0 and  a*(eO  ) a 0 
jo  J m 


ial 


1 otherwise. 


XXVI  DFCS  Difference 


(0  - 0 ) * 0.  where  the  power  of  0,,  = n 


o m 
and 


ial 


0 if  a^(eOo)  = 0 or  (a*(eOQ)  = 1 

and  (aj(e0m)  = 1) 

1 if  a*(e0  ) a 1 and  a*(e0  ) = 0. 

J ° J m 


The  following  properties  of  DPCS,  XXVII  through  XXXII,  are 
defined  for  completeness. 


XXVII  DFCS  Power  Set 

Let  X be  a DPCS,  then  the  power  set  of  X is  a set 
containing  all  its  subsets,  i.e. 

P(X)  - (Y:  Y c X), 

where  Y is  a DP  set  (not  necessarily  a DFCS). 

XXVIII  Power  of  a DPCS 

“Hie  power  of  a DFCS  is  the  number  of  unique  elements 
contained  in  the  set. 

XXIX  Power  of  a DFCS  Power  Set 

The  power  of  a DPCS  power  set  is  identical  to  property  IX 


XXX 


and  XIX,  i.e.  if  the  power  of  a DPCS  is  T then  the  power 
of  its  DPCS  power  set  is  2T. 


XXXI 


XXXII 


Characteristic  Function  of  a DPCS 
Hie  domain  of  a characteristic  function  (c.f.  ) for  a 
DPCS  X is  the  power  set  of  X.  Its  range  is  the  set 
{0,1}.  Hie  value  of  c.f.  is  dependent  upon* whether  or 
not  an  element  contained  in  X is  also  an  element  of  P(x). 
This  can  be  shown  as; 


where  XA  c X and  x . e X (Note  = 0 or  1). 
DPCS  Identity  Set 


If  *j  c X, 
if  / x4 


Define  the  Identity  DPCS  I 3 3 a Normal  DP  set 

Iv  = (1,1,  ...,1}  for  any  P(l.  } =»  K > 1.  Then  0 U I = I , 
*■  k — o n n7 

0 D I * 0 and  V n(0  cl), 
o n o ' o — n7 

DPCS  Null  Set 

Define  the  Null  DPCS  $ 33  a Normal  DP  set  <t>R  » (0,0,  ...,0} 
for  any  P(<J>k)  * k > 1.  Then  (0Q  U <t>n)  - 0Q,  (0Q  n 4>n)  = 4>  , 
and  V n(<t>n  <=  0q). 


The  properties  XXI  through  XXVI,  XXXI  and  XXXII  can  be 
illustrated  further  by  the  following  seven  examples.  Each  example 
can  be  related  back  to  the  previous  example  involving  DPCSs. 


I 

i 
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Example  1.  B1  c B 2 (or  B^  c b2),  i.e.  {0,0,0]  c (1,1,0)  but 

B2  £ B^  (or  B2  £ Bu),  i.e.  {1,1,0}  £ {0,1,1} 

Example  2.  / B2  *'e*  ^>0,0]  f {1,1,0}  but  B^  = B^  i.e. 

{0,0,0}  = {0,0,0). 

, 

Example  3.  B^'CI  B^  = B^,  i.e.  {1,0,1}  fl  {0,1,1}  = {0,0,1} 

*> 

Example  k.  B^  U B^  = Bg,  i.e.  .{0,1,1}  U {1,0,0}  = {1,1,1} 

Example  5.  Bg  - B^  = Bg,  i.e.  {1,1,1}  - {1,0,1}  = (0,1,0.) 

Example  6.  Bg  = (DPCS  Identity  Set),  i.e.  Bg  = {1,1,1} 

Example  7-  B^  = (DPCS  Null  Set),  i.e.  = {0,0,0}. 

I 

SUMMARY 

In  this  chapter  a description  of  a mathematical  base  for  the 

modeling  of  the  non-numerical  functions  of  DBM  has  been  presented. 

This  base  is  Data  Processing  (DP)  sets  and  DP  Characteristic  Sets 

(DPCS)  joined  with  set  theory  and  its  properties.  The  properties 

of  DP  3ets  and  DFCSs  were  defined  with  regard  to  set  theory 

properties.  Two  DP  set  operators  were  presented  (Push  (pM  and 

s 

Pull  (Pj»  plus  an  extended  property  for  a DP  characteristic 


function.  Ihe  chapter  concluded  with  sooie  examples  of  set  theory 
properties  for  DPC8s. 


Chapter  V 

DATA  BASE  MANAGEMENT  AND  SETS 

INTRODUCTION 

How  well  can  the  mathematical  base  described  In  the  previous 
chapter  model  Data  Base  Management  (DBM)?  The  material  presented 
in  this  chapter  provides  an  answer  to  this  question. 

Die  material  is  divided  into  four  parts.  Each  part 
represents  a level  of  data  in  DBM.  These  parts  are: 

1)  the  user  computer  interface  (Reserved  Word); 

2)  the  attribute  and  file  or  relationship  (f/R)  names 
(Data  Name); 

3)  the  modifiers  of  the  attribute  and  F/R  names  (Data 
Descriptors);  and 

4)  the  occurrences  of  the  attributes  and  F/Rs  (Data 
Occurrence ) . 

To  provide  clarity  in  the  discussions  of  these  four  levels, 
examples  will  be  used  throughout.  The  basic  example  structure 
shown  in  Fig.  V-l  is  from  Date  [12].  It  defines  a simple  file  or 

relation  (S)  and  a simple  Job  (the  GET  statement)  to  be  performed 

\ 

against  the  occurrences  of  8.  Other  examples  are  presented  where 
necessary. 


] 
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S# 

SNAME 

STATUS 

CITY 

SI 

SMITH 

20 

LONDON 

S2 

JONES 

10 

PARIS 

S3 

BLAKE 

30 

PARIS 

su 

CLARK 

20 

LONDON 

S5 

ADAMS 

30 

ATHENS 

DOMAIN  S# 

SNAME 
STATUS 
CITY 


Character  (5) 
Character  (20) 
Numeric  (3 ) 
Character  (15) 


RELATION  S (S#,  SNAME,  STATUS,  CITY) 

KEY  (S#) 

(gT  W(S.S.#,  S. STATUS ):S. CITY  =*  'LONDON' 


Figure  V-l.  Suppliers  Data  Model 
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USER  COMPUTER  INTERFACE 

The  user  level  of  DBM  is  involved  with  a high  level  language 
which  assists  in  conmunicating  with  the  computer.  The  user 
communicates  by  forming  words,  numerics,  expressions,  instructions, 
etc.  by  putting  together  those  alphanumeric  symbols  available  on 
a key  punch  machine,  teletype  terminal,  etc.  These  alphanumeric 
symbols  are  converted  to  mechanical  and/or  electrical  codes  and 
interpreted  by  the  computer.  The  computer  then  performs  one  or 
mor£  functions  and  responds  by  issuing  electrical  codes  to  a 
terminal,  printer,  etc.  which  can  be  converted  to  alphanumeric 
symbols  for  the  user  to  interpret. 

SYMBOLIC  LANGUAGE 

The  symbolic  language  that  is  utilized  to  communicate  with 
the  computer  will  be  modeled  first.  Let  a set  Alphanumeric  (AN) 
be  defined  as  unique  elements  composed  of  those  symbols  that  are 
available  on  a key  punch  machine,  etc.  and  are  interpretable  by 
a computer.  For  example, 

AN  =*  (A, B,  • • . , Z, 0, 1, ... , [,],?,.«*,  ")» 

A simple  expression  as  "open  file"  can  be  modeled  by  a Normal  DP 
Set  (X)  where  Y € P{AN),  i.e.  Y . {0,P,E,N,)f,F,I,L}  and  X => 
(OPEN^FILE)  . (O^J^E^N^E5,/,!8, I7,)*5}.  Die  general  model 
for  ccemmni eating  at  the  user's  level  is  a DP  set  defined  on  a 
"permutation,  with  replacement, " of  an  element  contained  in  the 
power  set  of  AN. 

To  model  the  above  expression  at  a lower  level,  a computer'  s 
word  size  can  be  taken  into  consideration.  Assume  the  power  of  AN 


1*9 

is  equal  to  6h  and  the  computer  word  size  is  six  characters  long. 
Then,  as  was  shown  in  Chapter  IV,  the  maximum  number  of  Normal  DP 
sets  that  can  be  created  is: 

T=6 

t 

i=0 

Two  of  these  Normal  DP  sets  would  be  X1  =*  (0,P,E,N,)f,)f } and 
^2  = I,  L,E,)tf,)tf }.  To  form  the  necessary  expression;  let 

\ - Of6  ), 

\ = *2  - 

and 

X = Yi  U ^(Y2)  = {0, P, E, N, )f, F, I, L, E } . 

From  this  example  it  can  be  seen  that  DP  sets  allow  for  the 
capability  to  describe  character  and  word  manipulation  to  describe, 
for  example,  data  base  structures,  parsing,  etc. 

RESERVED  WORDS 

As  shown  above,  the  words  that  can  be  created  are  quite 
numerous.  A small  number  of  these  words  are  reserved  words  and 
are  keyed  upon  by  the  DBM  System  (DBMS).  It  is  the  utilization  of 
these  words  in  the  proper  sequences  (syntax)  that  describes  to  the 
DBMS  which  job  is  to  be  performed.  A method  of  how  these  words 
could  be  managed  by  a DBMS  will  be  discussed  and  modeled  using 
sets  and  DP  sets. 

In  the  example  above,  the  word  "open"  can  be  assumed  to  be 
one  of  these  reserved  words.  The  words  following  a reserved  word 
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or  phrase  are  usually  limited  and  are  dictated  by  syntactical  rules. 
For  this  same  example,  the  DBMS  would  expect  the  next  word  follow- 
ing "open"  to  be  a file  or  relationship  name.  The  DBMS  would 
then  pass  this  name  to  the  next  level  of  processing.  Before  that 
is  discussed,  however,  a model  using  DP  sets  will  be  described 
which  attempts  to  give  a general  view  of  hew  a DBMS  might  utilize 
these  reserved  words  and  the  syntactical  information  associated 
with  them. 

Intr a- statement 

The  implementation  of  the  reserved  words  and  their  modifiers 
can  be  modeled  quite  simply  as  a Normal  DP  set.  Each  element  in 
this  Normal  DP  set  is  also  a Normal  DP  set  composed  of  two  or 
more  elements.  The  first  element  is  the  reserved  word  followed 
by  all  of  its  modifiers.  Hits  can  be  described  as  shown  below 
(without  commas  between  elements); 

{ (ReservedWord/Modifierl/Modifier2/. . . ), 
(ReservedWord/Modifierl/Modifier2/. 
{ReservedWord/Modifierl/Modifier2/. . . )). 

For  the  example  shown  in  Fig.  V-l,  the  Normal  DP  sets  may  look 
like  the  following  for  intr a- statement  syntax; 

1A  * {...  {Domain/(AttributeNames/(type(size  )•'))•• . 

(Relation/RelationName/(AttributeNames.' ) ). . . 

( Get /Re  la  t ionName  / (Relat ionName  • At  tr  ibuteName ) ). . . 

'/Value/' ). . . (Key/(AttributeName)). . . }. 
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A / has  been  arbitrarily  chosen  as  a word  delimiter,  an  exclamation 
point  within  parentheses  signifies  that  the  contents  of  the 
parentheses  may  be  repeated,  and  the  first  word  of  each  element 
in  the  DP  set  is  the  reserved  word.  These  choices  are  arbitrary 
and  should  be  viewed  as  an  example,  not  a recommended  procedure. 

Inter- statement 

The  inter- statement  syntactical  rules  are  the  rules  between 
reserved  words.  These  rules  are  to  specify  which  reserved  words 
can  immediately  follow  a particular  reserved  word.  'This  may  be 
modeled  as  a Normal  DP  set.  Each  element  in  this  Normal  DP  set 
may  also  be  modeled  as  a Normal  DP  set  where  the  first  element  is 
a reserved  word  followed  by  its  related  reserved  words,  e.g. 

{A/c/D ) signifies  that  reserved  words  D and  C may  follow  reserved 
word  A.  The  format  of  this  Normal  DP  set  of  inter- statement 
rules  can.be  defined  similar  to  the  above  as: 

IR  = { {ReservedWord/ (ReservedWord'. )),..., 
(ReservedWord/(ReservedWord.' ) } }. 

The  number  of  elements  in  IR  is  equal  to  the  number  of  reserved 
words  in  the  user's  high  level  language. 


Utilization 

After  a job  stream  has  been  parsed,  the  first  DP  set  (IA) 
would  be  used  to  detect  any  syntactical  errors  within  the  state- 
ment and  to  determine,  by  reserved  names,  which  functions  need  to 
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be  performed  (e.g.,  boolean  operators,  write,  delete,  retrieve, 
etc.  ).  The  DBMS  then  determines  which  "words"  are  supposed  to 
be  attribute  names  or  values  and  which  are  f/r  names.  The  attribute 
names  and  F/R  names  would  then  be  passed  to  the  data  dictionary 
and  directory  for  further  processing.  The  second  DP  set  (IR) 
discussed  above  is  utilized  to  determine  if  the  n-th  reserved 
word  in  the  statements  can  legally  follow  the  (n-l)st  reserved 
word.  This  is  accomplished  by  finding  the  list  of  reserved  words 
that  can  legally  follow  the  (n-l)st  reserved  word  and  determining 
if  the  n-th  word  or  its  equivalent  is  in  the  list.  If  so,  then 
the  DBMS  knows  that  reserved  word  n can  follow  reserved  word  n-1. 
This  same  procedure  would  be  performed  for  the  second  through 
the  last  reserved  word  in  the  user's  statement  sequence. 

The  first  acceptable  reserved  word  in  a job  can  be  modeled 
as  a Normal  DP  set  where  its  elements  are  DP  sets  with  two 
elements.  The  first  element  contains  the  reserved  words'  name 
and  the  second  element  is  a pointer  to  the  location  where  the 
reserved  words ' entry  is  located  in  the  physical  implementation 
of  IR.  An  example  of  the  format  of  this  first  ^nter- statement 
Normal  DP  set  is 

FIR  = { {ReservedWord/PointertoIR },  . . . , 
{ReservedWord/PointertoIR ) }. 

FIR  allows  the  DBMS  to  determine  if  the  beginning  of  a new 
job  has  a proper  reserved  word  and  provides  a connection  to  IR, 
to  be  used  for  the  second  reserved  word  in  the  user's  statement 
sequence,  as  discussed  above. 
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ATTRIBUTE  AND  FI  IE/RE IATIONSHIP  (f/R) 

Once  the  DBMS  has  determined  which  job(s)  have  to  be 
performed,  through  parsing,  in tra- statement  and  inter- statement 
syntax  checking;  its  next  consideration  is  with  the  data.  The 
first  level  of  description  concerned  with  actual  data  occurrences 
has  to  do  with  attribute  and  F/R  names.  Modeling  this  level  of 
data  by  using  sets  and  DP  sets  is  considered  here.  This  is 
followed  by  an  example  to  illustrate  what  detail  this  mathematical 
base  can  provide  to  DBM. 

ATTRIBUTES 

Assume  a data  base  consists  of  a number  of  attributes.  Hien 
these  attributes  can  be  modeled  as  a set  A where  the  total  number 

l 

of  attributes  (TA)  is  the  power  of  A and  each  element  in  A is  the 
attribute's  nume  A^.  A^  can  be  modeled  as  a permutation,  with 
replacement,  performed  on  an  element  of  P(AN}.  Each  attribute 
name,  A^,  can  be  modeled  as  a Normal  DP  set  A^.  Therefore, 
symbolically  the  data  base  consists  of  the  set  of  attributes 
A = (Ap  Ag,  ...,  A^) 

each  has  a Normal  DP  set  Ai  such  that  no  two  attribute 
names  are  equivalent.  An  example  of  two  Normal  DP  set  models  of 
attribute  names  might  be  Ai  = {E,M,P,  L,0, Y,  E,  E,  )rf,  N,  A,  M,  E ) and 
Aj  a [S, .,S, .,N,0},  which  represent  a code  for  social  security 
number . 

The  TA  attributes  in  the  data  base  have  unique  names  but 
some  of  them  may  have  synonym  attributes.  Two  or  more  attributes 
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are  synonyms  if  they  have  different  names  and  descriptors  or 

different  names  for  the  same  entity  in  the  real  world.  An 

example  of  two  synonym  attribute  names  could  be  = (E,M,  P, L,0,  Y, 

E, E, N, U, M, B,  E,  R ) and  A.  = {S, • , S, . , N,0 ).  These  synonym 

J 

attributes  may  occur  in  many  ways  in  forming  large  data  bases. 

For  example,  they  may  result  from  interfacing  more  than  one 
current  data  base,  adding  data  to  existing  large  data  bases, 
redefining  portions  of  existing  data  bases,  etc. 


FIIE/RELATIONSHIP  (f/r) 

DP  sets  can  be  used  to  model  F/Rs  as  collections  of 
attributes.  An  F/R  for  a data  base  can  be  modeled  as  a Normal  DP 
set  defined  on  a permutation,  with  replacement,  of  an  element 
contained  in  P{A).  Given  the  set  A,  the  set  of  all  its  subsets 
P{A)  has  a power  of  (See  Chapter  IV ) 

TA 

£ <“)  - 2TA 

i=0 

The  elements  or  sets  contained  in  P(A)  are  the  total  number  of 
subsets  of  A in  which  F/Rs  of  attributes  can  be  defined.  An 
example  of  an  F/R,  expressed  as  a Normal  DP  set  (0^)  might  be 
Oj,  » (N,A,M,E,)rf,A,D,D,R,E,S,S,)rf,P,H,0,N,E,^,N,0) 

- *,  U u U Pf  (V)  U ^ 

where 

Ai  = {N,  A,M,E },  = (A,D,D,R,E,S,S),  and 

A ^ = (P,H,0,N,E,jf,N,0). 
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The  name  of  an  f/r  may  be  modeled  in  a similar  way  where 
Ch  is  a permutation,  with  replacement,  of  an  element  contained 
in  P{AN).  For  this  example  would  be  the  Normal  DP  set 
0t  = (T,  E,  L,  E,  P,  H,  0,  N,  E,  )tf,  L,  I,  S,  T,  I,  N,  G } . 

The  notion  of  order  was  utilized  for  modeling  F/Rs  because 
it  provides  a definitive  structure  by  extending  the  concept  of 
relations  and  it  more  closely  resembles  a computer  representation. 
To  more  accurately  depict  a relation  of  two  or  more  attributes 
the  notion  of  order  must  be  introduced.  For  example,  let .0^  = 

(A,  N,  C,  E,  S,  T, 0,  R } and  6^  = {D,E,C,E,N,D,E,N,T}  be  Normal  DP 
sets  but  let  Afa,  A^,  0fc,  and  CL  be  SETS  where 
A*  = (P,A,R,E,N,T,S,^,N,A,M,EJ, 

Vl  = (C,H,I,L,D,S,y,N,A,M,E}, 

°k  - tv  Vi1'  and 

°d  “ *Vn»  *„)• 

Therefore,  0.  = 0..  Note  that  "a  parent  is  the  ancestor  of  a 

K J 

child"  is  not  equivalent  to  "a  parent  is  the  descendent  of  a' 
child. " Modeling  the  above  two  F/Rs  using  Normal  DP  sets 


generates : 


0.  = { PARENTSj^NAME^C  HI  LDS)f  NAME ) 


0.  - { CHI LDS^NAME^PARENTS^NAME } . 
J 


Modeled,  using  DP  sets,  the  example  illustrates  that  0^  / 0^, 
since  if  they  were  equal,  then  this  would  imply  that  A^  » A^^ 
and  ■ A^,  on  a character  per  character  basis,  which  is  a 


contradiction. 
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EXAMPLE  OF  USER'S  IEVEL  DATA 

<Biis  modeling  of  F/Rs  and  their  modifiers  as  DP  sets  falls 
into  the  third  level  of  data  utilized  in  communicating  with  a 
DBMS.  This  level  involves  which  attributes  are  contained  in  which 
F/R  and  what  is  the  user's  view  of  that  F/R.  Other  models  exist 
and  were  discussed  earlier  e.g.  tree  structures,  networks, 
relational  data  structures,  etc.  The  DP  set  model  is  general 
enough  to  model  till  of  the  above  models  as  shown  in  Appendix  A. 

Hie  fourth  level  of  data  utilized  with  a DBMS  concerns  the 
occurrences  of  the  attributes  and/or  F/Rs.  Each  attribute  A^ 
has  stored  in  the  data  base  one  or  more  occurrences,  designated 
A.  .,  where  the  subscript  i signifies  the  attribute  in  question 
and  the  subscript  j signifies  which  occurrence  of  attribute  i 
(A1).  As  an  example,  consider  the  attribute  A^  being  equated 
to  {C,  I,  T,Yj,  then  a typical  occurrence  may  be  LONDON,  PARIS,  etc. 
Occurrences  of  attributes,  as  seen  by  the  user,  are  modeled  in 
the  same  way  as  0.  and  A.,  i.e.  A.  is  a Normal  DP  set  defined 
on  a permutation,  with  replacement,  of  an  element  contained  in 
P(A5). 

In  concluding  the  discussion  of  the  first  two  levels  of 
data  in  DBM,  an  example  problem  is  given  which  illustrates  the 
DP  set  modeling  in  the  static  and  dynamic  state. 
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SUPPLIERS  DATA  MODEL 

The  Suppliers  Data  Model  shown  in  Pig.  V-l,  provides  an 
example  for  displaying  the  versatility  of  how  sets  and  DP  sets 
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can  be  used  to  model  DBM  at  the  user  level.  The  model  depicts  a 
small  data  base  with  four  different  attributes  which  are  listed 
under  DOMAIN  and  one  F/R  called  S,  with  a primary  key  (S#)  used 
to  identify  unique  occurrences  of  S.  Note  that  all  underlined 
words,  parentheses,  colons,  etc.  are  reserved  words  in  the 
language  used. to  communicate  with  the  DBMS.  These  underlined 
reserved  words  were  modeled  earlier  in  the  chapter  using  DP 
sets  (e.g.  IA  and  IR). 

Static  State 

The  small  data  base  could  be  constructed  at  the  user’s 
level  using  Normal  DP  sets  in  the  following  way: 

Attribute  Names  (Normal  DP  Sets) 

\ = ts,#j 

Ag  = {S,  N,  A,  M,  E) 

A5  = { S,  T,  A,  T,  U,  S ] 

A4  = {C,  I,  T,  Y) 

F/R  Name  (Normal  DP  Set) 

6.  = (S) 

F/R  Definition  (Normal  DP  Set) 

Oi  = Aj_  U {fi)  U P^U,)  U [f)  U P^) 

U l#16)  U Pg6(A4}. 

0i  = {S#)rfSNAME#STATUSjfCITY). 

The  data,  or  occurrences  of  the  attributes  and  the  F/R,  can  be 
defined  in  one  of  two  ways.  One  way  would  be  to  define  each 
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attribute's  values  first  and  then  the  proper  values  of  each 
attribute  could  be  "put  together"  to  form  the  correct  occurrences 
of  the  F/R.  Another  way  would  be  to  define  the  data  by  occurrences 
of  the  F/R. 

The  first  way  can  be  modeled  as  the  following: 


\,2 

*1,3 

*M 

*1,5 

*2,1 

^2,2 

*2,3 

*2,U 

*2,5 

*3,1 

*3,2 


(S,l,)f,)W, 

(S,2,y,)<,)0, 

{S,3,y,)^,)0, 

(s 

<8,5,W,)0, 

(H5,  J1, 02,  B4,  S?,  J^6,  ), 

{L2,A3,KU,B1,E5,/,...,y20}, 
{L2,  C1,  A3,  K5,  r\ Z20  ), 
(A1, D2,  A3 , m\  S3, ^ , y20 ), 

(2,0,)O, 

{1,0,*}, 


= (3,0,*], 

A^  L - {N5 , , L1, 02 , 05, N6, , )^15 }, 
Au  2 = {P1, A2,R5, 11*, S5,)^6,  ...,*15),  and 
A,  - (1^,A1,1^,e\iI5,S6,)<7>  ...,*15}. 


Given  the  above  occurrences  of  (A.  . ) ' s then  the  following 

t,  j 

equations  define  each  occurrence  of  according  to  the  first 
way  mentioned  above: 


°1,1  " ^,1  U U Ps'/^A5,l^  U *s  ^A4,l^ 

°1,2  * ^1,2  U U Ps5^A5,2^  U Ps  ^A4,2^ 
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°1,3  ' *1,3  U fX3)  U ’f  <A3,5>  U 

V - V u u •?<*,,  i’  u ^SV,1> 

°1,5  “ *1,5  u *a(*zt5*  u *8  (*3,3  ^ u ps  ^,3^* 

The  second  way  requires  that  some  of  the  attribute's 
occurrences  ((Aj)'s  and  (A^)'s)  be  duplicated  since  the  data 
are  being  defined  by  occurrences  of  the  F/R  rather  than  the 
occurrences  of  the  attributes.  Therefore,  the  occurrences  of 
the  A_  and  A^  attributes  are  redefined  as.- 


A^x  = {2,0,  *) 

a4  i =»  lL1,o2,tr>,i>h,0'>.w  *7, ...,*15) 

=.  {1,0, *} 

au,2  = {^A2^,;!4,^,  .K15) 

Aj  } - {5,0,*) 

\ } = {A2,R^ , I4, S^, P1, )c  ,...,*15) 

Aj>U  = 12,0,*) 

a4  u = {l\o2,n^ , d4,o^,n^,*7,  . . ., ) 

Aj>5  =■  (5,0,*) 

Au>5  = {A1,1^,H?,E4,N5,S6,y7, . ..,y15) 

With  these  modifications  to  the  data  then  the  (0.  .)'s  can  be 

0 

represented  as; 


i,d 


*i,Ju^<A2,J)U^5(\j)U^8(\j)- 


These  two  ways  illustrate  the  differences  encountered  by  defining 
the  data  for  an  F/R  with  duplicating  identical  occurrences  of 
attributes  and  without  duplicating  these  occurrences. 


Dynamic  State 

To  further  illustrate  the  two  definitions  of  this  data 
base,  consider  the  interpretation  of  the  GET  statement,  i.e. 
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GET  W( S.S#.S. STATUS ):S. CITY  =*  ’LONDON'. 


This  can- be  interpreted  as  the  defining  of  a new  F/R:  0„ 


(W). 


This  interpretation  therefore  defines  Og  = ^ U ()rf3 ) U P3^). 
Obtaining  the  values  for  the  attributes  in  Og  is  however 
dependent  on  S.CITY  = ' LONDON'.  Hie  GET  statement  is  requesting 
the  DBMS  to  create  an  f/R  named  W having  two  attributes  whose  names 
are  S#  and  STATUS.  It  is  to  provide  occurrences  for  this  F/R  based 
upon  the  value  of  the  attribute  CITY.  That  is,  for  each  occurrence 
in  F/R,  S;  if  the  attribute  CITY  has  an  occurrence  equal  to 
LONDON  then  the  DBMS  is  to  retrieve  the  corresponding  values  of 
the  occurrences  of  S#  and  STATUS  and  assign  them  an  occurrence  in 
the  F/R  named  W.  This  procedure  can  be  expressed  using  DP  sets, 

Set  k = 0 

\/j[if  (O^J  n p^8 {L1, o2, N5, D4, 05, N6 } ) = {L29,03°,H51,D32,033,H34 


} 


J-l 


then  for  k > k + 1,  set  02  k = A1  j U j 

If  the  second  method  for  defining  data  were  used  in  loading  the 
original  data  then  k a ^1  j ^ j ) will  not  cause  any 

problems.  But,  if  the  first  method  discussed  were  used  in  load- 
ing the  original  data,  then  an  error  would  occur  since  A^  ^ does 
not  exist.  To  correct  this,  Og  ^ in  the  above  expression  need  to 
be  redefined  as; 

°2,k  “ (°1,J  " (°l,j  * J*  •••» 

u - <°1,3  * <3*  I^3») 
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where  XT  . is  the  £-th  element  for  (O.)'s  j-th  occurrence.  The 
result  of  this  illustration  would  yield  the  following  F/R,  W, 
as  shown  by  Date  [12] 


w 

S# 

STATUS 

SI 

20 

s^ 

20 

MDDIFIERS  OF  ATTRIBUTES  AND  FII£/REIATIONSHIPS 

The  next  level  of  modeling  concerns  the  DBMS's  overhead  data 
for  insuring  that  the  data  descriptions  are  utilized  correctly  and 
that  the  correct  occurrences  are  retrieved.  This  function  can  be 
performed  through  the  utilization  of  a data  dictionary  and  a 
data  directory.  They  will  be  described  in  detail,  related  to  the 
sample  problem  and  modeled  using  sets  and  DP  sets. 

DATA  DICTIONARY 

A data  dictionary  provides  data  about  the  attributes  in  the 
data  base.  These  data  are  needed  by  the  DBMS  for  the  eventual 
processing  of  the  attribute's  occurrences.  An  example  of  the 
kind  of  data  that  may  be  contained  in  a data  dictionary  are: 

1)  representation, 

2 ) format, 

3)  size,  and 

4 ) synonym  data. 

Once  a job  has  been  received  by  a DBMS,  parsed,  and  checked  for 
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syntax  errors  the  DBMS  then  accesses  these  data  contained  in  the 
data  dictionary. 

REPRESENTATION 

The  representation  technique  for  an  attribute’s  occurrence 
determines  how  the  computer  will  represent  and  treat  it  in  binary 
form.  The  modeling  of  a data  dictionary  concerning,. a representation 
reverts  back  to  the  definition  of  the  set  A.  This  set  can  be 
partitioned  into  t mutually  exclusive  sets  depending  on  what  the 
user  has  defined  to  the  computer  as  the  representation  technique 
required  for  the  occurrences  of  each  of  the  attributes.  This 
variable  t is  computer  dependent  and  represents  the  power  of  the 
set,  X,  of  different  representation  techniques  available  on  a 
computer,  i.e.,  X = {f.p.,i,b,al,etc.  },  where  f.p.  represents 
floating  point,  i represents  integer,  b represents  boolean,  and 
a i represents  alphanumeric.  If,  Ap  A^,  ...,  Ax  are  attributes 
whose  occurrences  are  represented  as  floating  point  then  let 

F.P.  = (Ap  A^,  •••,  Ax}. 

Similarly,  the  set  A can  be  partitioned  into  sets  I,  B,  and  AL, 
where  I,  B,  AL,  and  F.P.  are  mutually  exclusive.  This  implies 
that  no  attribute's  occurrences  can  be  represented  in  more  than 
one  way  and  therefore  A =»  (F.P.  U ALU  fi  U I]. 

Format  and  Size 

Within  each  of  these  subsets  of  A discussed  above,  other 
subsets  can  be  formed.  These  subsets  are  mutually  exclusive  and 
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are  based  on  the  definition  of  the  format  and/or  the  maximum  size 
of  the  occurrences  of  each  attribute,  A^.  If  the  subset  is  AL 
then  it  can  also  be  partitioned  into  subsets.  The  criteria  for 
this  partitioning  can  be  the  maximum  size  of  the  attribute's 
occurrences,  e.g.  subsets  of  one  character  long  (SALl),  two 
characters  long  (SAI2),  etc.  If,  however,  the  subset  of  A is 
F.P. , then  it  can  be  partitioned  in  two  different  ways;  by  equi- 
valent formats  and  size.  Partitioning  by  equivalent  formats  implies 
for  example  that  all  attributes  whose  occurrences  have  a floating 
point  format  of  F7-3  would  be  defined  as  one  subset  and  all 
attributes  whose  occurrences  have  a floating  point  format  of 
F7*2  would  be  defined  as  another  subset,  thus 

F7.3  = fA^/  Ajq,  • • 't  A^ J 

and 

T7 ‘2^  = {A^,  A^,  •••,  A^). 

This  example  implies,  for  instance,  that  Ag^  is  represented  as  a 
floating  point  attribute  where  there  are  7 characters  reserved 
for  its  occurrence,  there  are  3 characters  reserved  for  those 
numbers  right  of  the  decimal  point  and  4 (7-3=4)  characters 
reserved  for  those  numbers  left  of  the  decimal  point.  Note  that 
(F7»3)  D (F7.2)  = $>  and  similarly  (SALl)  Cl  (SAI2)  = <t>  which 
implies  that  no  one  attribute  can  have  its  occurrences  defined 
with  two  different  formats  or  sizes. 

The  other  way  in  which  the  subset  F.P.  can  be  partitioned 
is  by  size.  This  can  be  accomplished  similarly  as  for  the 
subset  AL;  i.e.  F.P.  can  be  divided  into  subsets  of  maximum 
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sizes  of  two  characters  long  (S  F.P.2),  etc.  For  the 
above  example  the  subset  of  seven  characters  long  would  be 

((F7-5)  U (P7-2)  U (F7.0)  U (P7-1)  U (f7-4) 

U (F7-5)  U 0*7.6))  = \Q,  ...,  Ar  \9>  ..., 

• • • ) . 

Note  that  for  these  examples,  size  was  measured  in, characters } 
the  same  could  have  been  accomplished  by  measuring  the  size  in 
bits. 

Synonyms 

Two  or  more  attributes  are  synonyms  if  they  have  different 

names  and  descriptors  or  different  names  for  the  same  entity  in 

the  real  world.  In  the  sharing  of  large  data  bases,  there  are 

certain  attributes  used  by  different  jobs  that  necessitate 

having  the  same  attribute's  occurrences  defined  differently. 

This  requirement  is  one  of  the  reasons  for  having  synonym 

attributes  in  a data  base.  Since  synonym  attributes  have 

different  names,  then  they  may  have  a different  representation 

and/or  a different  format,  and/or  a different  size.  An  example 

of  this  might  be  the  following:  A^  »S.S.#and  A^  a S.S.No.  where 

A^  may  be  represented  as  an  integer  and  A^  may  be  represented 

as  an  Alphnumeric.  One  occurrence  of  A^  and  A^  might  be  modeled 

using  Normal  DP  sets  as 

A.  a {0,0,1,4,7,8,2,0,8} 

*9  y 

\ 5 - (l,1*, 7,8, 2, 0,8) 


65 
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where  the  actual  employee's  social  security  number  appearing  on 
his/her  employment  card  would  be  "001-47-8208." 

Dictionary  Construction 

The  actual  construction  of  a data  dictionary  is  problem 
dependent.  For  instance,  if  the  attribute's  occurrences  have 
maximum  fixed  sizes,  then  these  values  will  probably  appear  in  the 
dictionary  since  they  are  constant  modifiers  or  size  descriptor 
values.  However,  if  the  size  descriptor  of  an  attribute’s 
occurrences  are  variable,  i.e.,  are  different  for  each  occurrence 
of  an  attribute,  then  the  value  for  the  size  descriptor  will 
probably  not  appear  in  the  data  dictionary  but  will  appear  in 
the  data  directory.  This  is  so  since  it  modifies  the  occurrences 
as  they  are  stored  on  the  computer  more  than  the  general  definition 
of  an  attribute's  occurrences.  If,  for  instance,  a data  base 
only  has  one  representation  and  format  descriptor  (e.g.  a£) 
then  there  would  be  no  need  to  include  the  value  for  a 
representation  for  each  attribute  in  the  data  dictionary. 

The  last  entry  that  may  appear  in  a data  dictionary  is 
related  to  descriptions,  functions,  and  synonyms.  If  a data  base 
has  synonyms  then  there  are  two  major  ways  in  which  the  synonyms 
can  be  handled: 

1)  (Type  I synonyms),  store  the  occurrences  of  the 
synonyms  in  only  one  description  and  develop 
additional  functions  to  convert  the  different 
occurrences  from  one  description  to  another;  and 
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2)  (Type  II  synonyms),  store  the  occurrences  of  the 
synonyms  in  its  different  descriptions. 

There  are,  however,  advantages  and  disadvantages  to  both  of  the 
above  approaches.  For  instance,  the  first  approach  enlarges  the 
data  dictionary  and  requires  more  processing  just  to  retrieve 
information.  The  second  approach  increases  the  siae  of  the  data 
base  and  causes  a problem  with  data  integrity  in  that  the  same 
occurrences  are  stored  multiple  times  in  the  computer  under 
different  attribute  names.  The  functions  required  by  the  first 
approach  are  all  different  because  their  domains  and  ranges  are 
different.  The  domain  and  range  might  be  the  sets  (F9.2)  and 
(SI3)  (Integer,  3 characters  long)  where  (F9.2)  might  be  the 
format  for  = ANNUAL  SAIARY  and  (SI3)  might  be  the  format  for 
Aj  =*  NEAREST  K$  SAIARY.  Another  example  might  oe  = S.S.# 
and  Aj  = S.S.No.  where  the  range  and  domain  could  be  (Sl8)  and 
(SAL8),  respectively. 

Dictionary  Model 

The  modeling  of  a data  dictionary  requires  at  least  a 

unique  and  shortened  Normal  DP  set  for  each  attribute  name  (A. ) 

J 

and  F/R  (6^.  The  rest  of  what  is  contained  in  a data 
dictionary  is  problem  dependent  and  could  contain  some,  all,  or 
more  than  the  elements  in  the  following  DP  set  model  of  the 
i-th  entry  of  a data  dictionary: 


DIC1  „ (ACj  U F^AKj  ) U Ffy(CD)  U F^y+*(St) 
U P“*+y"(*Cp)  U 
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where  AC.  - Normal  DP  set  of  the  attribute's  shortened  name, 

J 

AN.  - Normal  DP  set  of  the  attribute's  name, 

J 

CD  - Normal  DP  set  of  the  representation  technique 
descriptor, 

S.  - Normal  DP  set  of  the  size  descriptor, 

AC  - Normal  DP  set  of  the  attribute's  synonym  shortened 
P 

name, 

- Normal  DP  set  of  the  name  of  the  function  to 

convert  the  occurrences  of  the  synonyms,  and 

x,  y,  z,  and  w are  the  powers  of  the  Normal  DP  sets  AC.  (and 

J 

AC  ),  AN.,  CD,  and  S.,  respectively. 

P J T- 

This  model  refers  to  the  i-th  occurrence  in  the  data  dictionary 
containing  data  about  an  attribute  in  the  data  base.  If,  however, 
the  i-th  element  in  the  data  dictionary  contains  data  about  an 
F/R,  then  a model  using  Normal  DP  sets  could  be  the  following: 


DICi  - {AO.  U P^AN^)  U ?£+y(St)  U P^+y+W(ACp)} 

where  AC.  - Normal  DP  set  of  the  (f/R)'s  shortened  name, 

J 

AN.  - Normal  DP  set  of  the  (F/R)'s  name, 

J 

St  - Normal  DP  set  of  a pointer  to  the  (f/R)'s  hashing 
algorithm  and/or  directory,  and 
ACp  - Normal  DP  3et  of  the  primary  keys  shortened  name. 


Dictionary  Example 

As  an  example,  consider  the  Suppliers  Data  Model  from 
Date  [12],  where  the  data  dictionary  could  be  modeled  as 


. «vr  - . > % 


IIIWitlMiilii 
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DIC^  = {1,/, S,^t, / ,a,  i, /,5) 

DIC2  = {2,/,S,N,A,M,E,/,a,l,/,2,0} 

DIC^  = (3> />  S,  T,  A,  T,  U,  S, /,  i, /,3  ) 

DIC4  = {U,/,C,I,T,Y,/,afif/fl,5) 
dic5  = (5,/,S,/,X,X,/,l) 

where  / (slashes)  are  used  as  delimiters  1,  2,  3,  4,  and  5 are 
used  for  shortened  names  for  S#,  SNAME,  STATUS,  CITY  and  S, 
respectively;  and  XX  is  the  pointer  to  S's  directory  or  hashing 
algorithm.  Note  that  DIC^  through  DIC^  pertain  to  attributes 
while  DICp.  pertains  to  the  relation  S. 

Continuing  with  Date's  example,  consider  the  GET  statement, 
i.e.  GET  W(S.S.#,S. STATUS):  S.CITY  = 'LONDON'.  To  process  this 
job  the  DBMS  must  first  operate  on  the  data  dictionary  to  find 
the  descriptors  of  each  of  the  attributes  in  question  (i.e.  S#, 
STATUS,  and  CITY).  Therefore,  the  DBMS  searches  the  data 
dictionary  until  it  finds  the  proper  attribute  then  writes  the 
proper  DIC^  into  the  user's  area  of  the  computer  memory  for 
further  processing.  An  example  of  this  process  using  DP  sets 
can  be  shown  for  the  attribute  STATUS  as  follows: 

Set  i a 1 

1)  If  (S,T,A,T,U,S)  0 (^(DIC^)  = (S, T,A, T,U, S) 

Go  TO  2 

i - i + 1 Go  To  1 

2)  Write  DIC^  into  user's  area. 


= -- — I 
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Once  the  desired  attributes  are  found  (or  it  is  determined  that 
they  do  not  exist  in  which  case  processing  will  be  stopped)  the 
proper  F/R's  can  be  found  in  a similar  way. 

In  summary,  attributes'  names  and  F/R's  names  may  be  stored 
in  a data  dictionary.  The  descriptors  that  may  be  found  for  any 
attribute  name  are  the  attribute's  shortened  name,  representation 
technique,  format,  size,  and  synonym  data.  The  descriptors  that 
may  be  found  for  an  f/r  name  are  the  F/R’s  shortened  name,  a 
pointer  to  its  directory  and/or  hashing  algorithm  and  its  primary 
key's  shortened  name. 

The  elements  in  the  data  dictionary  relating  to  the  attributes, 
provide  the  DBMS  the  information  required  to  process  their 
occurrences  when  they  are  written  into  the  user's  area  of  the 
computer  memory.  The  method  of  obtaining  these  occurrences  is 
dependent  on  the  F/R  element  in  the  data  dictionary.  For  the 
example  above,  the  two  important  elements  are  the  pointer  to  the 
(F/R)'s  hashing  algorithm  and/or  directory  and  the  primary  key's 
shortened  name.  The  pointer  may  point  to  either  a data  directory 
or  a hashing  algorithm  and  directory.  If  the  pointer  points  to 
the  data  directory,  then  it  may  be  assumed  that  the  F/R  occurrences 
will  follow  contiguously,  through  pointers,  or  through  an  inverted 
list  based  upon  the  primary  key’s  shortened  name.  Otherwise  the 
directory  cm  be  assumed  to  be  stored  with  the  hashing  algorithm 
«iso  based  upon  the  primary  key's  shortened  name. 
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BATA  DIRECTORY 

A data  directory  can  be  thought  of  as  a data  base  road  map. 
It  directs  the  computer  to  the  required  occurrences  of  data  once 
the  data  dictionary  has  been  processed.  A data  directory  is 
modeled  in  this  portion  of  the  chapter  using  sets  and  DP  sets. 
This  is  followed  by  an  example.  A discussion  of  data  retrieval 
techniques  is  provided  along  with  models  of  an  inverted  list  and 
a disk  device. 


Directory  Model 


A data  directory  can  be  modeled  as  a Normal  DP  set.  Again, 
like  the  data  dictionary,  the  actual  structure  of  a data  'directory 
is  data  base  or  problem  dependent.  To  show  how  a data  directory 
can  be  modeled  by  DP  sets  consider  the  following  general  model  for 
a tree  data  structure.  Hie  first  element  is  an  integer  (ix^) 
depicting  the  size  of  the  following  element,  the  second  element 
is  the  shortened  name  of  the  F/R(0^),  the  third  element  is  iXg, 
an  integer  depicting  the  number  of  attributes  in  6^,  the  fourth 
element  is  an  integer  (ix^ ) depicting  the  number  of  occurrences 
of  the  relationship  (if  Ix^  is  greater  than  1,  then  this  implies 
a "repeating  group"),  the  fifth  element  is  an  integer  (Ix^) 
designating  the  number  of  subrelations  ("subsumed  relations"),  and 
the  sixth  element  is  an  integer  (ix^)  designating  the  size  of  the 
following  element.  Hie  sixth  element  is  similar  to  the  first 
element.  Hie  seventh  element  will  be  the  first  attribute’s  name 
A^  in  the  relation.  The  sixth  and  seventh  elements  will  repeat 
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pairwise  for  IXg  times.  Then,  there  will  be  a list  of  Ix^  zeros 

separated  by  delimiters,  which  are  "place  holders"  for  pointers 

to  other  locations  in  the  computer.  These  locations  will  contain 

the  occurrences  0,  . of  the  relationship  desired.  Then  the  above 
J 

starts  over  again  for  at  least  the  number  of  subrelations  Ix^. 

The  reason  why  it  is  at  least  Ix^  times  is  because  a subrelation 
may  have  subrelations. 


Directory  Example 

Consider  the  following  Normal  DP  set  model  for  Date's 
example  where  "//"  signifies  delimiters  before  and  after  a data 
directory  and  "/"  signifies  a delimiter  between  elements. 

DIR  = {/, /,  1> />  S,  /, 4,  /,  1,  /,0, /, 2,  /,  S,#f  /, 5»  /»  S, N,  A, M, E,  /, 
6,/,S,T,A,T,U,S,/,i*,/,C,I,T,Y,/,V,/,X,/,Y,/,Z,/,/}. 

The  values  of  V,  X,  Y,  and  Z are  the  computer  locations  of  the 
4 occurrences  of  S.  In  location  V the  following  Normal  DP  set 
model  would  depict  the  first  occurrence  of  F/R  S: 

3 {/>  />2>  />  S,  1,  /,  5>  /,  S,M,  I,  T,  H,  /,i,  / ,2.,§,  /, 

6,  /,  L,  0,  N,  D,  0,  N,  /,  P,  1,  /,  / ) 

where  PI  is  the  location  where  X is  stored  in  the  DIR  model. 

Considering  the  same  example  used  before,  i.e.  GET  W 
(S.S#,S. STATUS);  S.CITY  = ' IONDON ' , the  pointer  in  the  data 
dictionary  has  located  DIR  shown  above.  Before  the  DBMS  begins 
to  process  the  occurrences  of  S,  it  must  first  evaluate  DIR  to 
determine  if,  in  fact,  it  does  represent  S and  if  the  attributes 
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in  question  (i.e.  S#,  STATUS,  and  CITY)  are  in  fact  attributes  in 
S.  The  DBMS  would  evaluate  the  contents  of  DIR  through  a software 
code  whose  logic  would  be  directed  by  the  elements  in  DIR.  Consider 
the  following  brief  example  as  a model  of  what  this  code  is 
required  to  do,  for  first  verifying  that  the  f/R  directory  is  for 

S and  whether  pr  not  the  first  attribute  is  Sf . _ _ _ 

- ' if  [s  fn  { DIR } = (S)  CONTINUE 

IY  (S,#)n  P^{DIR}  = {S,#)  CONTINUE. 

(These  examples  are  very  simple  and  are  presented  only  to  show 
in  detail  that  DP  sets  can  be  used  in  describing  non-numerical 
processing  in  DBM. ) Once  all  the  attributes  in  question  have 
been  found  in  DIR  then  the  DBMS  may  proceed  in  following  the 
pointers  (V,X, Y, Z)  to  find  the  occurrences  of  S and  possibly 
writing  them  into  the  user's  work  area  for  the  data  processing 
required  to  provide  the  proper  result  (W). 

Directory/Data  Retrieval 

There  exists  data  retrieval  techniques  other  them  the  one 
discussed  above.  The  example  depicts  what  may  occur  if  the 
occurrences  were  stored  and  found  through  the  utilization  of 
pointers  inbedded  in  DIR.  If  the  occurrences  were  stored 
sequentially  then  pointers  would  not  have  been  needed  and  the 
occurrences  would  have  been  contiguous  with  DIR.  Other  techniques 
could  also  be  modeled  such  as  binary  trees,  single  linked  lists, 
double  linked  lists,  etc.  However,  if  the  occurrences  are  stored 
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by  a hashing  algorithm  then  the  pointer  in  the  data  dictionary 
would  point  to  DIR  and  following  DIR  would  exist  a mathematical 
algorithm  that  would  hash  on  the  primary  key  value  and  generate 
an  address  for  the  proper  occurrence  of  the  f/r  required.  Hash- 
ing algorithms  are  numerical  in  nature  and  therefore  no  attempt 
has  been  made  to  model  them  using  DP  sets. 

Another  major  way  in  which  the  occurrences  maj  have  been 
located  is  through  the  use  of  inverted  lists.  These  can  be 
modeled  using  DP  sets.  Let  an  inverted  list  be  modeled  as  a DP 
set  where  one  element  can  be  modeled  as  the  following  Normal  DP 
set; 

((Aj.),  ^Ai,  1'  Li»  Lj»  **•»  ^Ai,2'  La'  Ls’  ***» 

...»  {Ai>p,  Li,  Ld,  ...,  I^J), 

where  A^  is  a Normal  DP  set  of  the  primary  or  secondary  key's 
name, 

A^  j is  a Normal  DP  set  of  the  primary  or  secondary  key's 
occurrence,  and 

L is  a Normal  DP  set  of  the  primary  or  secondary  key's 

* ^ 

occurrence's  address  or  location  (i.e.,  a pointer). 

Note  for  primary  key  attributes  there  will  only  exist  one  per 
A^  j . A fully  inverted  list  will  cohtain  as  many  of  the  above 
elements  as  there  are  attributes  in  the  F/R. 

In  the  inverted  list  above,  if  the  occurrences  are  stored 
in  main  memory,  then  these  pointers  (Lx)  will  contain  their  address 
or  memory  location.  If  these  occurrences  are  located,  say  on  a 


disk  or  drum,  then  these  pointers  would  designate  the  portion  of 
the  peripheral  device  that  should  be  copied  into  main  memory. 

An  example  of  a pointer  to  a disk  may  be  constructed  as  a Normal 
DP  set  where  the  first  element  is  the  disk  name  the  second 
element  might  be  an  integer  denoting  the  cylinder  I , the  third 
element  might  represent  the  track  T\,  and  the  last  element  might 
represent  the  block  B^.  Given  this  description,  the  format  for 
a pointer  to  a physical  block  would  be 

1Bi’  V v V- 

In  summary,  a data  directory  was  discussed  and  modeled 
using  DP  sets.  An  example  was  presented  for  illustrative  purposes. 
The  data  directory  assists  in  providing  to  the  user's  work  area 
the  occurrences  of  the  data  requested.  Once  the  data  are  in  the 
user's  work  area  then  they  can  be  processed  by  t'ne  DBMS's  codes. 


OCCURRENCES  OF  ATTRIBUTES  AND  FIUE/RELATIONSHIPS 


The  occurrences  of  the  data  that  the  computer  operates  upon 
are  in  a binary  code.  The  modeling  described  in  this  chapter  has 
been  performing  on  elements  contained  in  the  character  set  AN. 

It  is  the  purpose  of  this  portion  of  the  chapter  to  present  the 
logic  that  shows  that  there  exists  a one-to-one  mapping  between 
the  modeling  performed  at  the  character  and  the  modeling  performed 
at  the  bit  level.  Accomplishing  this  will  show  that  sets  and  DP 
sets  can  model  the  non-numerical  processing  of  DBM  for  all  four 
levels  of  data.  Tlie  methodology  used  is  to  show  a one-to-one 
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relationship  for  one  character.  Since  each  character's  binary 
representation  is  unique,  then  the  joining  of  more  than  one 
character  also  produces  a unique  and  one-to-one  correspondence 
to  its  binary  representation. 


ALPHANUMERIC/BINARY 

The  data  that  the  user  describes  to  the  DBMS  must  be  trans- 
formed to  a binary  abstraction  compatible  with  the  digital 
computer.  This  binary  abstraction  does  not,  however,  destroy 
any  of  the  models  developed  so  far.  This  is  so  since  all 
communication  with  the  computer  was  through  elements  of  P(AN) 
and  every  unique  element  in  AN  can  be  modeled  as  a unique  Normal 
DP  set  of  power  1 (e.g.  {A"*"),  (B^J,  etc.).  Examples  of  this  are 
such  codes  as  ASCII  standard  code  (8  bit).  From  Foster  [l6]  a 
one-to-one  correspondence  of  the  unique  elements  in  AN  and  their 
uniquely  ordered  elements  can  be  seen. 

These  uniquely  ordered  elements  (K)  are  all  either  zero  or 
one.  These  can  be  modeled  as  a Normal  DP  set  B^  defined  on  a 
permutation,  with  replacement,  of  an  element  contained  in  P{0, 1). 
In  the  case  of  ASCII  the  number  of  zeros  and  ones  is  8.  This 

g 

implies  that  there  are  2°  or  256  unique  elements  (Normal  DP  sets) 
that  can  be  obtained  from  P{0, 1),  i.e. 

(P(0,l))k  - 28  , 256. 

Let  this  set  of  256  elements  (B^'s)  be  called  the  total  binary 
set  or  TB.  It  is  now  possible  to  define  a function  g which  maps 
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each  element  in  AN  to  its  respective  element  in  TB.  An  example 
of  the  above  defined  terms  and  modeling  can  be  seen  as  follows; 


AN  g(AN)  = Bi  € TB 

i [i^,o^,  i^,o\o^,o6,o7,o8 } 

[0,0, 1,0,0,0,0,1} 

+ {O1, l8, 0^, l7, 1^ , 06, 0^ , 1^ } 

H [0,1,0,0,1,0,0,01 

The  function  g is  between  sets  AN  and  TB  such  that; 

1)  the  Domain  of  g,  D(g),  is  equal  to  AN,  and 

2)  if  g(x)  = y and  g(x)  = z,  then  y = z where  x e AN  and 
z e TB. 

It  should  be  noted  that  since  the  power  of  TB  is  256  and  the  power 
of  AN  is  less  than  256,  then  g is  defined  as  being  one-to-one  and 
into.  A function,  f,  is  said  to  be  a function  from  A into  B if 
the  range  of  f is  not  equal  to  B or  f is  not  onto  B.  This  implies 
that  the  Range  of  f is  a proper  subset  of  B.  For  the  above 
example  since  g is  one-to-one  and  into  this  implies  that  there 
exists  an  element  (B^)  in  TB  such  that  there  is  an  element  (a^ 
in  AN  such  that  g(a^)  ^ B^.  This  leads  to  the  definition  of  the 
subset  of  TB  which  is  not  in  the  Range  of  g,  say  NTB,  such  that 


NTB  = (TB  -\/i(g(ai))) 


where  a.  e AN  and  PAN  = power  of  AN. 
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It  is  necessary  to  continue  with  the  above  modeling  philosophy 
in  order  to  define  a function  from  TB  to  AN.  This  provides  part 


of  the  modeling  portion  for  the  communication  of  the  computer  back 
to  the  terminal,  or  printer,  etc.  But,  the  power  of  TB  is  great- 
er them  the  power  of  AN,  therefore  an  artificial  variable  t will 
be  defined  as  an  element  of  AN.  In  actuality,  this /variable,  t, 
could  represent  those  cases  of  errors  caused  by  interference  in 
data  transmission. 

Given  the  above,  a function  h,  similar  to  g,  may  be  defined 
for  TB  to  AN  as: 

1)  the  domain  of  h,  D(h),  is  equal  to  TB; 

2)  if  h(y)  = x and  h(y)  = z,  then  x = z; 

3)  if  y e NTB,  then  h(y)  = T;  and 

4)  if  y e (TB  - NTB),  then  h is  equal  to  the  inverse  of  g. 


These  two  functions  g and  h have  closed  the  modeling  loop 
between  the  user's  high  level  language  and  the  digital  computer's 
language  (binary)  and  back  again  for  one  character.  It  can  be 
easily  seen  that  a Normal  DP  set  defined  on  a permutation,  with 
replacement,  of  two  or  more  elements  in  P{AN),  also  has  a one-to-one 
mapping  to  TB  and  back  again.  Consider  the  following  example. 


I*t  A^  = {H,  + ) =*  aj  U ps^ak^  where  a^  = (h)  and  afc  a {+),  then 
g(At)  = g(a^)  U P®(g(ak))  . {0,1,0, 0,1,0, 0,0, 0,0, 1,0,1, 0,1, 1,  ). 

For  h( {0, 1,0,0, 1,0,0,0, 0,0, 1,0, 1,0, 1, 1,  )),  or  the  return  of  A^,  let 
X =»  {0,1,0, 0,1,0, 0,0, 0,0, 1,0, 1,0, 1,1,  ). 

Let  Y - (X  - Yx)  where  YL  = {09,010,  l11,©12,  l15,0ll+,  l15,  l16 ) and 


78 


Y2  = P®(Y1) 

then  (Ai)  = h(Y)  U pj  h(Y2)  = {H,+  }. 

SUMMARY 

This  chapter's  purpose  was  to  provide  a description  of  how 
the  mathematical  base  provided  in  Chapter  IV  could  model  DBM. 

The  chapter  was  divided  into  four  parts  covering  a range  from 
that  level  seen  and  utilized  by  a user  down  to  the  bit  level  of 
a digital  computer.  Throughout  the  chapter,  where  appropriate, 
examples  were  used  to  clarify  the  concepts  presented. 

The  next  chapter  contains  a description  of  a proposed  hard- 
ware implementation  of  a DP  characteristic  set  model.  The  model 
describes  the  functions  performed  within  a data  dictionary  and 
partial  data  directory  by  a DBMS. 


Chapter  VI 

DICTIONARY/DIRECTORY  PROCESSOR  (DDP) 


INTRODUCTION 

Pour  levels  of  data  exist  and  can  be  seen  by  a user  through 
interacting  -with  a DBM  system.  These  levels  can  be  called: 

1)  Reserved  Word, 

2)  Data  Name, 

3 ) Data  Descriptor,  and 

4)  Data  Occurrence. 

The  Reserved  Word  level  includes  those  words  that  are  "reserved" 
for  the  DBM  system,  e.g.  the  words  GET,  KEY,  RELATION,  and 
DOMAIN,  as  shown  in  Fig.  V-l.  The  Data  Name  level  includes  those 
names  specified  by  the  user  to  identify  specific  attributes  and 
F/Rs  in  the  data  base.  Examples  of  these  names,  as  shown  in 
Fig.  V-l,  are  the  F/R  name  S and  the  attribute  names  S#,  SNAME, 
STATUS,  and  CITY.  The  Data  Descriptor  level  includes  those 
words  and  symbols  that  are  utilized  to  "describe"  the  occurrences 


of  the  names  in  the  Data  Name  level.  Referring  to  Fig.  V-l  these 


1)  the  words  Character  and  Numeric; 

2)  the  domain  sizes  designated  as  5,  20,  3,  and  15;  and 

3)  the  F/R  name  S,  and  its  description  (S#,  SNAME, 
STATUS,  CITY). 

The  Data  Occurrence  level  includes  those  data  most  often  sought 


by  the  user.  Referring  to  Fig-  V-l,  note  the  attribute  names 
and  F/R  name  S and  their  occurrences; 


Name 


An  Occurrence 


SNAME 


BLAKE 


STATUS 


LONDON 


(S2,  JONES,  10,  PARIS) 


Associated  with  each  of  the  above  levels  are  inherent 


functions  performed  by  a DBMS.  For  instance,  at  the  Reserved 


Word  level,  the  DBMS  parses  the  data  and  performs  syntactical 


functions.  At  the  Data  Name  level  the  DBMS  must  build  and 


maintain  a data  dictionary  and  data  directory.  The  Data 


Descriptor  level  provides  the  data  for  the  data  dictionary  and 


directory.  At  the  Data  Occurrence  level  the  DBMS  performs 


equal  to,  less  than,  and  greater  than  search  functions,  sort 


functions,  add  functions,  and  delete  functions 


In  previous  work  some  of  the  functions  associated  with  the 


Data  Occurrence  level  have  been  implemented  in  hardware.  These 


were  described  in  Chapter  II.  They  included  the  use  of  associative 


processors,  logic  on  disk  devices  and  hypothetical  machines 


This  research  is  concerned  with  some  of  the  functions 


associated  with  the  Data  Name  and  Data  Descriptor  levels  and 


their  implementation  in  hardware.  These  functions  operate  on 


a DBMS's  data  dictionary  and  data  directory.  A data  dictionary 
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can  be  thought  of  as  that  part  of  a DBMS  where  information  is 
maintained  about  the  characteristics  of  the  data  occurrences.  A 
data  directory  is  that  portion  of  a DBMS  that  maintains  information 
pertaining  to  the  relationships,  both  physical  and  logical,  between 
data  names  and  their  occurrences.  The  data  names  eyid  descriptors 
that  have  been  chosen  to  illustrate  a hardware  implementation  of 
a data  dictionary  and  partial  directory  processor,  or  Dictionary/ 
Directory  Processor  (DDP)  are  the  following: 

1)  Attribute  Name  - The  name  of  an  attribute  as  known 

* 

by  the  user  of  the  DBMS. 

2)  Attribute's  Shortened  Name  - A shortened  attribute 
name  utilized  for  computing  efficiency. 

5)  Representation  - The  name  of  the  representation 
technique  for  an  attribute's  occurrence. 

4)  Size  Descriptor  - The  size  descriptor  for  an 
attribute's  occurrence. 

5)  Synonym  Function  - The  name  of  the  function  that 
converts  an  attribute's  occurrences  to  a format 
compatible  (both  in  representation  technique  and 
size)  with  its  synonym's  occurrences. 

6)  Uniqueness  Descriptor  - An  attribute  descriptor 
for  designating  whether  an  attribute's  occurrences 
are  unique. 

7)  Password  Name  - A name  used  by  the  DBMS  to  aid  in 
restricting  access  to  an  attribute's  occurrences. 

1)  Privileged  Name  - A name  used  by  the  DBMS  to  aid  in 


■ 
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maintaining  the  validity  of  an  attribute's 
occurrences. 

9)  F/R  Name  - The  name  of  an  F/R  as  known  by  the 
user  of  the  DBMS. 

10)  F/R  Shortened  Name  - A shortened  F/R  name  utilized 
for  computing  efficiency. 

11)  F/R  Primary  Key  - An  attribute's  shortened  name 
of  the  key  for  an  f/R. 

12)  f/R  Pointer  - A pointer  (directly  or  indirectly) 
to  an  F/R's  inverted  list,  hashing  algorithm,  or 
data  directory. 

13)  Containment  Map  - A descriptor  which  designates 
which  attributes  are  associated  with  each  F/R. 

14 ) Bit  Map  - A descriptor  that  designates  which 
F/Rs  are  related  to  each  other. 


SAMPIE  PROBLEM 


The  sample  problem  shown  in  Fig.  V-l  is  concerned  with  one 
F/R  (S),  four  attributes  (S #,  SNAME,  STATUS,  and  CITY)  and  a key 
(S#).  The  shortened  names  for  S,  S#,  SNAME,  STATUS,  and  CITY 
were  chosen  as  5,  1,  2,  3,  and  4,  respectively.  The  form  of 

/ I 

representation  for  S#,  SNAME,  and  CITY  is  Alphanumeric  or 
Character  (ai)  and  for  STATUS,  integer  or  numeric  (i).  The 
size  descriptors  for  S#,  SNAME,  STATUS,  and  CITY  are  5,  20,  3, 
and  15  characters.  The  F/R  pointer  has  been  chosen  as  XX.  The 
problem  is  to  provide  to  the  user  an  f/R,  W,  based  on  the 


i 

; 


; 

/ 
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occurrences  of  the  attribute  CITY;  where  W is  composed  of  the 
attributes  S#  and  STATUS.  The  problem  is  posed  to  the  DBMS 
through  one  of  its  reserved  words,  GET.  This  is  expressed  as; 

GET  W(S.S#,S. STATUS):  S.CITY  = ’ IDNDON' . 

This  can  be  interpreted  as  asking  the  DBMS  to  provide  the 
occurrences  of  the  attribute  names  S#  and  STATUS,  of  f/r  S,  for 
those  occurrences  of  F/R  S whose  attribute's  name,  CITY,  has  an 
occurrence  equal  to  LONDON. 

DICTIONARY/DIRECTORY  PROCESSOR  INTERFACE  IEVEL 

The  method  by  which  the  DDP  interfaces  with  the  DBMS  to 
solve  the  above  problem  will  now  be  described.  Referring  to 
Fig.  VI-1,  the  user  communicates  with  the  DBMS,  resident  on  the 
sequential  computer,  and  submits  his/her  GET  command.  Hie 
sequential  computer  directed  by  the  DBMS  will  perform  its  over- 
head functions,  e.g.  parse  the  command  and  perform  its  syntactic 
functions.  This  provides  the  DBMS  with  the  necessary  informa- 
tion for  performing  functions  on  the  data  dictionary  and  data 
directory.  This  information  includes  which  characters  in  the 
command  represent  an  F/R  name,  attribute  name,  or  an  attribute 
occurrence,  and  which  attributes  are  associated  with  which  F/R. 
The  DBMS  would  direct  the  arrangement  of  this  information  and 
data  into  a form  of  commands  and  controls.  These  commands  and 
controls,  along  with  the  data,  are  stored  in  the  transfer /user 's 
memory  area  (TUMA),  and  the  sequential  computer's  CPU  is  notified 


CONTROL 


Figure  VI-1.  Dictionary/Directory  Processor  - Sequential  Computer  Interface 


that  a job  for  the  DDP  is  ready  for  execution  and  where  in  the 
TUMA  it  is  located.  The  sequential  computer's  CPU  then  notifies 
the  control  of  the  DDP  about  the  above  information  and  assigns  a 
location  in  TUMA  for  the  DDP  to  store  its  results  after  completing 
its  functions. 


Six  Functions 

The  functions  performed  by  the  DDP  provide  six  different 
sets  of  results.  The  first  function  determines  if,  in  fact,  the 
F/R  names  and  attribute  names  exist  in  the  data  base.  If  they 
do,  then  the  DDP  provides  to  the  DBMS  their  shortened  names.  If 
a name(s)  does  not  exist,  then  the  DDP  informs  the  DBMS  and  stops 
processing  the  function.  For  the  example  above,  the  following 
would  be  provided: 

Attribute  Name /Shortened  Name  F/R  Name/Shortened  Name 
S#/l  S/5 

SNAME/2 
STATUS/5 
CITY/4 

The  second  function  determines  if  the  attributes  have  synonyms. 

If  they  do,  then  their  attribute  names  must  be  determined,  depend- 
ing on  whether  the  DBMS  supports  type  I or  type  II  synonyms  (see 
DICTIONARY  CONSTRUCTION . Chapter  V ) and  the  type  of  command  making 
the  request.  For  the  above  example  there  are  no  synonyms.  Hie 
third  function  determines  if  the  attributes  in  the  user's  request 
-e  associated  with  their  respective  F/R  name.  In  the  example. 
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the  containment  map  would  be  utilized  to  determine  if  the  f/r  S 
« 

has  S#,  STATUS,  and  CITY  as  associated  attribute  names.  If 
not,  the  DDP  informs  the  DBMS  and  stops  processing  the  function. 
If  there  were  more  than  one  F/R  involved  in  che  user's  request 
such  that  they  implied  a relationship  among  them;  then  the  fourth 
function  would  be  to  determine  if,  in  fact,  the  DBMS  supported  a 
relationship  among  these  F/Rs.  If  the  DBMS  did  not,  the  function 
would  be  aborted.  The  fifth  function  would  provide  the  rest  of 
the  descriptors  for  each  attribute  name  and  synonym  name  involved 
in  the  requesting  conmand.  For  the  example,  the  following  would 


be  provided: 


Attribute  Name 


SNAME 


Representation 


STATUS 


Hie  sixth  function  provides  the 


primary  key  and  pointer  for 


each  F/R  and  related  F/R  involved  in  the  requesting  command.  For 
the  above  example,  the  following  would  be  provided; 


F/R  Name 


Primary  Ke\ 


Pointer 


DICTIOKARY/DIRECTORY  PROCESSOR  - DATA 


SYSTEM 
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by  utilizing  a configuration  of  AMs  and  ARRAYS  of  logic.  Control 
and  data  are  passed  from  the  CMC  to  the  AMs  and  ARRAYS  and  then 
to  the  sequential  computer.  Once  a job  is  completed  by  the  DDP, 
the  DDP's  CMC  informs  the  sequential  computer  of  its  findings  and 
transfers  the  resultant  data  to  the  proper  location  in  the  TUMA. 
The  DBMS  can  then  proceed  with  either  informing  the  user  of  an 
incorrectly  formulated  command  or  proceed  to  fulfill  the  command 
by  retrieving  the  proper  occurrences  of  the  F/r(s)  and  their 
remaining  data  directory( s ) . 

DICTIONARY/DIRECTORY  PROCESSOR 
AMs /ARRAYS 

The  heart  of  the  DDP  is  the  AMs  and  ARRAYS  as  shown  in  Pig. 
VI-2.  The  four  AMs  and  one  Random  Access  Memory  (or  AM),  RAM/AM, 
are  physically  linked  together  by  four  programmable  arrays  of 
"switches."  A major  and  comnon  function  that  will  be  performed 
on  these  AMs  many  times  over  is  searching  its  memory  for  a word, 
whose  contents  are  equivalent  to  a word  placed  in  its  comparand 
register.  A secondary  function  is  a link  function.  This 
function  causes  one  or  more  words  in  an  AM  or  an  ARRAY  to  be 
linked  to  one  or  more  words  in  another  AM  or  an  ARRAY.  The  word 
or  words  controlling  this  linkage  is  usually  determined  by  the 
major  function  discussed  above  in  conjunction  with  one  of  the 
programmable  arrays. 


RAM/ A« 
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AMI.  AFC  and  AM3,  AM^ 

The  four  AMs,  RAM/AM  and  the  three  arrays,  contain  the  data 
names  and  descriptors  that  were  discussed  above.  AMI  and  AM? 
contain  the  Attribute  Names  and  their  shortened  names,  respect- 
ively. AM3  and  AM4  contain  the  F/R  Names  and  their  shortened 
names,  respectively.  AMI  is  connected  to  AM?  and  AM5  is  connect- 
ed to  AM1*  such  that  each  name  is  connected  to  its  respective 
shortened  name.  To  illustrate  this,  refer  to  Fig.  VI-3.  Assume 
the  first  function  discussed  above  is  being  performed  and  the 
attribute  name  in  question  is  STATUS.  Then  the  CMU  would  load 
the  word  STATUS  in  AMl's  comparand  register  and  perform  an  equal- 
to-search.  The  attribute  name  STATUS  does  exist  in  the  data  base. 
Therefore  this  will  cause  the  word  in  AM?,  where  STATUS'S  short- 
ened name  (3)  is  stored,  to  respond.  The  contents  of  this  word 
can  then  be  read  into  the  CMU's  memory.  This  same  procedure  can 
similarly  be  performed  for  F/R  names  to  their  shortened  names. 
Inversely,  F/R  shortened  names  and  attribute  shortened  names 
once  found  can  cause  their  respective  names  to  respond.  The 
diagonal  square  cells  in  ARRAY  I contain  a flip  flop  and  two  AMD 
gates.  These  cells  provide  the  connection  between  AMI  and  AM? 
which  helps  perform  the  second  function  discussed  above.  Similarly, 
the  cells  in  ARRAY  IV  help  perform  the  fourth  function.  These 
cells  will  be  discussed  in  greater  detail  later. 

ARRAY  III 

The  contents  of  AhC  and  AM5  are  connected  to  each  other 
through  ARRAY  III.  Imagine  ARRAY  III  as  being  a large  matrix  where 


I 


- • — 


The  Interaction  Among  AMI,  AfC,  and  ARRAY  I of  the  Dictionary/Directory 
Processor. 
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each  row  represents  an  F/R  and  each  column  in  the  matrix 
represents  an  attribute's  shortened  name  in  the  data  base. 

Let  each  cell  of  this  matrix  take  on  the  values  zero  or  one; 
where  a one  in  cell  i,j  indicates  that  the  attribute's  shortened 
name  stored  in  the  j-th  word  in  AM?  is  associated  with  the  F/R 
name  stored  in  the  i-th  word  in  AM3-  A zero  indicates  that  they 
are  not  associated.  (Note  that  each  row  of  this  matrix  is  a 
Data  Processing  Characteristic  Set  (DPCS)).  For  illustrative 
purposes  refer  to  Fig.  VI-4.  The  third  function  is  performed 
after  the  first  and  second,  therefore,  the  activated  words  in 
AM?  contain  the  attribute's  shortened  names.  These  activated 
words  are  used  to  drive  ARRAY  III.  To  obtain  a response  in  any 
row  of  ARRAY  III,  the  row  must  have  a one  in  each  cell  whose 
column  is  being  driven.  In  the  example  above,  as  shown  in  Fig. 
VI-4,  columns  one  through  four  would  be  activated.  The  rows  one 
and  two  of  ARRAY  III  rfould  respond  and  activate  rows  one  and  two 
of  AM5-  The  CMU  could  then  read  out  the  contents  of  words  one 
and  two  of  AM5  which  contain  the  F/R  names  in  which  the  attribute's 
shortened  names  (1,  2,  5,  and  4)  are  associated.  The  reverse 
process  is  also  possible,  i.e.  given  an  F/R  name  the  attribute's 
shortened  names  cam  be  obtained. 


ARRAY  II  - RAM/AM 

The  fifth  function  which  provides  the  descriptors  for  each 


attribute  name  is  accomplished  through  ARRAY  II.  Not  only  is  AMI 
connected  to  AM?,  word  by  word,  but  each  word  in  AMI  and  AM?  is 


connected  to  its  respective  row  of  cells  in  ARRAY  II.  ARRAY  11(a) 
is  similar  to  ARRAY  III  in  that  it  is  a matrix  consisting  of  cells, 

■ where  each  row  represents  an  attribute's  name  and  its  shortened 
name.  Each  column  of  ARRAY  Il(a)  represents  one  of  the  possible 
values  of  each  attribute  descriptor.  These  cells  consist  of  one 
flip  flop  and  one  AND  gate.  They  provide  the  connection  between 
an  attribute  name  (AMI  and  AJG)  and  its  descriptors  (RAM/AM).  To 
illustrate  how  function  five  can  be  performed,  a discussion  of 
the  arrangement  of  the  descriptor's  values  in  the  RAM/AM  is  given 
below. 

The  RAM/AM  contains  the  attribute's  descriptors  in  a 
specific  order.  Assume  that  the  first  group  of  words  in  the  RAM/AM 
contain  the  different  sizes  an  attribute's  value  may  have.  The 
second  group  of  words  contains  the  different  representations. 

The  third  group  is  the  synonym  functions,  the  fourth  and  fifth 
are  the  uniqueness  descriptors,  the  sixth  is  the  password 
descriptors  and  the  seventh,  the  provileges  descriptor.  Within 
these  defined  groups,  an  attribute  cannot  have  more  than  one  value 
per  group.  For  each  value  that  applies  to  an  attribute,  the  proper 
cell  in  ARRAY  II  is  activated.  How  these  are  set  shall  be  discussed 
later.  For  now,  assume  a 1 in  cell  i,j  of  ARRAY  II,  shown  in 
Fig.  VI-5,  implies  that  the  value  in  word  j of  the  RAM/AM  describes 
the  attribute  name  in  word  i of  AMI.  A zero  in  i,j  implies  that 
the  value  in  word  j of  the  RAM/AM  does  not  apply  to  the  attribute 
name  in  word  i of  AMI.  Assume  the  fifth  function  is  performed 
after  the  fourth  function;  then  a row  in  ARRAY  11(a)  would  be 
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activated.  The  cells  having  ones  in  this  row  will  activate 


their  corresponding  words  in  the  RAM/AM. 

Once  this  is  performed 

the  CMU 

can  read  the  contents 

of  these  words,  in  order,  into  its 

memory’. 

From  Fig.  VI-5,  the 

attribute ' s 

descriptors  would  be: 

. From  AMI 

From  RAM/AM 

s# 

5 

a£ 

SNAME 

20 

a£ 

STATUS 

3 

i 

CITY 

15 

a£ 

AM3,  AM1*  and  RAM/AM 

The  sixth  function  which  provides  the  primary  key  and 
pointer  descriptor  for  each  f/r  is  accomplished  through  AM3  and 
AM1*  connected  directly  to  the  RAM/AM.  Each  word  in  AM5  and  AM1* 
is  connected  to  two  consecutive  words  in  the  RAM/AM.  Since 
function  6 is  performed  last,  each  F/R  name,  or  shortened  name, 
is  known  and  either  one  can  cause  the  driving  signal  to  activate 
the  RAM/AM  to  obtain  their  descriptors.  To  understand  how  the 
sixth  function  is  performed  the  second  portion  of  the  RAM/AM 
must  be  described. 

The  words  in  the  RAM/AM  following  the  attribute  name 
descriptors  are  reserved  for  the  primary  key  and  pointer  descriptors 
for  F/R  names.  For  each  F/R  in  the  data  base  there  are  two  words 
in  the  RAM/AM.  The  first  contains  the  name  of  the  f/R's  primary 
"*y  and  the  second  contains  the  F/R's  pointer  descriptor.  For 


illustrative  purposes  refer  to  Fig.  VI-6.  Assume  the  sixth 


function  is  performed  after  the  fourth  function,  then  • he  proper 


words  in  AM5  and  AM1*  have  been  activated.  These  words  can  then 
be  used  to  activate  their  corresponding  words  in  the  RAM/AM.  For 


the  example  above,  the  sixth  function  would  provide  the  following 
after  the  CMU  read  the  corresponding  words  from  the  RAM/ AM  and 


F/R  Name  Pr imary  Ke; 


DICTIONARY/DIRECIORY  PROCESSOR  GATE  AND  FLIP  FD3P  LEVEL 


To  describe  the  DDP's  AM's  and  ARRAY'S  at  the  gate  and  flip 


flop  level,  a series  of  sections  of  the  hardware  will  be  invest! 


gated  individually.  All  the  structures  represent  logical 


relations  only.  No  optimization  nor  electrical  constraints  are 


considered.  The  only  elements  used  are  RESET/SET  (RS)  flip  flops 


GENERAL  AM 


Consider  Fig.  VI-7  as  a representation  of  an  AM.  Each  word 


would  contain  the  bit  configuration  representing  its  respective 


stored  data,  e.g.  if  its  AMI  then  the  stored  data  are  attribute 


names.  The  lines  for  setting  each  flip  flop  are  not  shown  but 


would  be  connected  to  the  set  and  reset  ports  of  each  flip  flop 


This  figure  is  a model  of  all  the  AMs  shown  in  Fig.  VI-2,  ARRAY 
III,  and  the  RAM/AM  if  it  is  implemented  as  an  AM.  It  can  be  used 


1st  PORTION  2nd  PORTION 


ARRAY  II 

Figure  VI-6.  Interaction  Between  AM3  and  the  RAM/ AM  of  the  Dictionary/Directory 
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to  discuss  how  an  equal  to  search  is  accomplished  in  an  AM.  Assume 

that  the  words  in  the  AM  are  loaded  with  their  proper  data  and 

the  equal  to  search  word  is  stored  in  the  comparand  register. 

For  each  bit  (i)  in  the  comparand  register,  one  of  its  interrogation 

lines  (I  . or  I,  . ) is  energized  by  the  CMLF.  If  bit  i in  the 
o,  i 1,  i 

comparand  register  is  equal  to  zero  then  interrogation  line  I . 

o,  1 

is  energized  and  if  bit  i in  the  comparand  register  is  equal  to 
one,  then  interrogation  line  1^  ^ is  energized.  If  any  of  the 
words  in  the  AM  are  equal  to  the  contents  of  the  comparand  register, 
then  the  AND  gate  for  the  equivalent  word  will  be  activated.  In 
addition,  if  an  equal  to  search  was  desired  on  only  a portion  of 
the  contents  of  an  AM  word,  then  those  bits  that  were  not  required 
would  have  both  their  interrogation  lines  I , and  I1  . energized. 

0,1  Lf  1 


Response  Register 

The  output  of  these  AMs,  through  their  word  AND  gates, 
energize  a response  register.  A response  register  with  the 
additional  capability  of  finding  the  first  response  is  sometimes 
called  a ladder  circuit  and  can  be  modeled  as  shown  in  Fig.  VI-8. 
TO  find  the  first  response,  assume  the  flip  flops  in  Fig.  VI-8 
are  initially  set  to  zero  by  energizing  line  R.  If,  for 
instance,  an  equal  to  search  is  being  performed  and  word  2 in 
the  AM  is  equal  to  the  word  in  the  word  in  the  data  comparand 
register,  then  line  2 (from  word  2's  AND  gate)  will  activate  its 
respective  flip  flop  in  the  response  register.  TO  find  the  first 
responsive  word  after  an  equal  to  search  in  the  AM,  the 
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interrogation  line  1^  is  energized  and  the  first  word  (2)  will  be 
sensed  by  line  S,.  To  find  the  next  responsive  word,  after  the 
present  word  has  been  processed,  the  reset  line  (rg)  would  be 
energized  and  again  1^  would  be  energized.  It  is  assumed  that  the 
energizing  of  r^  only  resets  flip  flop  i.  This  procedure  would 
continue  until  all  responsive  words  were  determined^. 

Function  I 

Given  the  above  information,  it  is  now  possible  to  add 
more  detail  to  the  six  functions  discussed  above.  Consider  Figs. 
VI-9  and  VI- 10.  Figure  VI-9  provides  a detailed  view  of  ARRAY  I 

shown  in  Fig.  VI-3.  Figure  VI-10  represents  a cell  in  ARRAY  I. 

% 

The  connection  of  AMI  and  AM2  is  shown  by  the  lines  1,  1',  2,  2', 
etc.  To  activate  this  connection  the  CMU  must  energize  interro- 
gation line  Ig-  This  is  performed  after  an  equal  to  search  is 
performed  in  AMI  as  discussed  above.  The  energizing  of  Ig  causes 
the  respective  flip  flops  in  the  response  register  of  A M2  to 

respond  therefore  designating  the  correct  words  to  be  read  by  the 

( 

CMC  from  AM? . This  same  procedure  is  applicable  for  AM5  to  AM^ 
except  interrogation  line  1^  is  energized  instead  of  Ig. 

Functions  II  and  IV 

Die  second  and  fourth  functions  are  very  similar  in  concept 

and  can  be  performed  structurally  using  the  same  procedure  and 

«•« 

hardware.  But  first,  a synonym  attribute  must  be  defined.  Two 
or  more  attributes  which  have,  at  the  user's  level,  the  same 
"value"  but  have  different  names  are  synonym  attributes.  As  stated 


I 
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previously,  there  are  two  major  ways  in  which  to  handle  synonyms; 

1)  Type  I.  Store  the  occurrences  of  the  synonyms  in 
only  one  description  and  develop  additional  functions 
to  convert  the  different  occurrences  from  one 
description  to  another. 

2)  Type  II.  Store  the  occurrences  of  the  synonyms  in 
their  different  descriptions  e.g.,  representation, 
size,  etc. 

The  second  and  fourth  functions  are  similar  in  the  respect 
that  when  given  an  attribute  name  the  DDP  is  requested  to  find  its 
synonyms  and  when  given  an  F/R  name  the  DDP  is  requested  to  find 
its  related  F/R  names.  The  key  to  the  similarity  is  that  two 
or  more  sets  of  synonym  attributes  are  mutually  disjoint  and 
two  or  more  sets  of  related  F/R’s  are  also  mutually  disjoint. 

To  structurally  implement  these  functions,  an  additional 
constraint  (beyond  the  fact  that  the  j-th  word  in  AMI  and  AM? 
must  contain  the  same  attribute  name  and  attribute's  shortened 
name,  respectively)  is  that  synonym  attributes  in  AMI  and  their 
respective  shortened  names  in  A»£  must  be  stored  in  contiguous 
words.  (These  same  constraints  are  also  needed  for  F/R  names  and 
their  respective  AMs.) 

A further  constraint  for  Type  I synonyms  only  should  be 
noted  here.  In  any  contiguous  set  of  synonym  words,  the  first 
word  should  contain  the  synonym  attribute  name  whose  descriptors 
are  of  the  stored  occurrences.  Given  these  constraints,  the 
second  function  can  be  performed  after  the  first  function  has 
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been  performed.  The  procedure  can  be  illustrated  through  an 
example.  Suppose  in  Figs.  VI-9  and  VI-10  the  first  tvo  words  in 
AMI  contain  synonyms,  function  one  has  been  performed  on  the  first 
synonym  and  in  F^.  VI- 10  i is  equal  to  1.  Then  to  perform 
function  2 the  CMU  will  energize  SI1  and  SI2.  If  the  flip  flop 
in  cell  1 is  set,  this  will  cause  the  response  registers  in  the 
second  word  of  AMI  and  AM?  to  be  set.  Similarly,  if  the  second 
and  n-th  cells  are  not  set,  then  the  signals  in  SI1  and  SI2  will 
terminate  and  the  only  response  registers  that  will  be  set  will 
be  the  first  two  words  of  AMI  and  AM?.  The  responding  words  in 
the  AMs  can  then  be  read  by  the  CMU  to  fulfill  the  second  function. 
The  same  procedure  is  valid  for  performing  function  four  for 
AM5,  AM1*,  and  ARRAY  IV. 

In  Fig.  VI- 10  the  set  and  reset  lines  are  not  shown.  They 
exist  and  are  controlled  by  the  CMU.  The  setting,  or  resetting, 
of  these  flip  flops  allows  the  DDP  to  respond  to  changes  in  the 
data  base. 

Function  III 

Tb  gain  more  insight  into  how  function  III  is  performed, 
consider  Fig.  VI- 11  and  Fig.  VI-4.  In  Fig.  VI- 11,  AM3  and  AM? 
are  left  out  but  the  lines  connecting  their  respective  response 
registers  are  shown.  The  incoming  lines  at  the  top  of  Fig.  VI-11, 
when  activated  properly,  represent  the  state  of  the  response  store 
of  AfG . Ihe  lines  "n  ...  2 1"  are  returning  to  the  set  port  of 
the  flip  flops  in  AM?'s  response  register.  On  the  side  of  Fig. 
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VI-11,  the  lines  1,  2,  1',  and  2'  are  wir 
AM3  (See  Fig.  VI-9). 

To  perform  function  three  the  CMU  per 
resets  the  comparand  register  (shown  in 


(n  a similar  way  with 


function  one, 
l VI-11),  and  then 


energizes  interrogation  lines  1^  and  Ig.  These  operations 
constitute  an  equal  to  search  to  be  performed  on  the  contents  of 
ARRAY  III  with  the  responding  attribute  shortened  names  in  AM?. 
Energizing  1^  initiates  the  comparand  register  and  energizing  Ig 
provides  a signal  to  those  bit  positions  that  are  not  in  the  equal 
to  search.  The  set  of  OR  gates  associated  with  Ig  performs  the 
task  of  "masking"  thosie  bit  positions  not  required,  therefore  the 
name  "mask  register."  Next,  the  CMU  energizes  I<.  which  will  set 
the  flip  flops  in  AMJ's  response  register  (see  Fig.  VI-9)  which 
denotes  the  F/R  names  which  have  the  attribute  shortened  names  in 
A M?  in  their  set  of  attributes.  This  procedure  can  also  be 
reversed,  i.e.  the  F/R  name  is  known  and  it  is  desired  to  determine 
the  names  of  all  of  its  attributes.  The  task  would  be  done  by 
first  performing  an  equal  to  search  on  AM5  with  the  F/R  name  in 
question  and  then  energizing  Ig  of  AM5  (See  Fig.  VI-9)  in  order 
to  cause  a flip  flop  in  ARRAY  Ill's  response  register  to  be  set. 

The  word  in  ARRAY  III,  designated  by  this  set  flip  flop,  can  be 
found  by  energizing  Ig  and  then  loaded  into  ARRAY  Ill's  comparand 
register.  The  attribute  shortened  names  are  found  by  the 
energizing  of  1^  which  activates  lines  "n  ...  2 1"  in  ARRAY  I 
which  in  turn  will  set  flip  flops  in  the  response  register  of  AM?. 
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Function  three  is  not  completed  as  yet.  Suppose  the  data 
base  is  implemented  so  that  synonyms  are  of  Type  I.  Then  only 
one  of  the  attributes,  the  first  one  in  its  set,  will  have  its 
occurrences  stored.  Therefore,  when  the  other  synonym  attribute's 
occurrences  are  required,  the  DBMS  will  have  to  access  those  f/R's 
where  the  stored  occurrences  of  the  first  attribute  in  the  synonym 


set  are  located.  To  determine  these  F/R  names,  the  DDP  would 
first  perform  functions  one  and  two.  This  would  cause  the  S 
Register  shown  in  Fig.  VI- 11  to  be  set  by  all  the  synonym  attribute 
shortened  names.  The  CMLJ  would  then  reset  the  comparand  register 
and  energize  interrogation  line  1^,  in  turn.  This  will  cause  the 
first  set  flip  flop  in  the  S Register  to  set  the  comparand  register. 
Energizing  Ig  and  proceeding  as  above  will  provide  the  DDP  with 
the  required  F/R  name(s). 

Function  V 

To  understand  function  V in  more  detail,  refer  to  Figs. 

VI-5,  VI-12,  and  VI- 15-  The  lines  labeled  (1,1'),  (2,2'),  and 
(n,n')  shown  in  Fig.  VI- 12  are  connected  to  AMI  and  AJC  through 
ARRAY  I.  The  lines  labeled  are  connected  to  the 

set  ports  of  the  flip  flops  in  the  response  register  for  the 
words  in  the  first  portion  of  the  RAM/AM.  The  cells  in  the  ARRAY 
11(a)  are  shown  in  Fig.  VI- 15-  The  setting  of  these  cells  by 

the  CMU  is  similar  to  having  a 1 in  ARRAY  II  shown  in  Fig.  VI-5 
discussed  previously.  To  find  the  descriptors  of  an  attribute, 
assume  that  the  ladder  circuit  has  been  reset  in  Fig.  VI-12  and 
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and  function  I has  been  performed.  This  will  cause  one  of  the 
flip  flops,  say  flip  flop  2,  in  the  ladder  circuit  to  be  set.  If 
the  CMU  then  energizes  line  I,,  this  will  cause  the  second  line  to 
energize  the  2, i cells  for  i =*  1,  . . • , D.  The  cells  in  the  second 
row  whose  flip  flops  are  set  will  cause  their  respective  lines  (i^) 
to  be  energized  and  therefore  set  their  respective  flip  flops  in 
the  RAM/ AM's  response  register.  The  CMU  can  then  retrieve  the 
descriptors  from  the  RAM/AM  by  interrogating  its  response  register. 
These  descriptors  will  be  for  that  attribute  whose  name  is  stored 
in  the  second  word  of  AMI. 

Function  VI 

The  detail  level  description  for  function  VI  can  be  described 
through  Fig.  VI-6  and  Fig.  VI- 11*.  The  lines  (1,1'),  (2,2'),  ..., 
(m,m')  shown  in  Fig.  VI- 11*  are  connected  to  ARRAY  IV  which  are 
connected  to  AM5  and  AM1*.  Lines  labeled  l 1 2,  ly  — , 

j,  are  connected  to  the  set  ports  of  the  flip  flops  in  the 
response  register  for  the  words  in  the  second  portion  of  the  RAM/AM. 
To  find  the  descriptors  of  an  f/R,  assume  that  the  ladder  circuit 
in  Fig.  VI-11*  has  been  reset  and  that  the  other  functions  have 
been  performed.  Ihen  the  F/R  name(s)  in  question  will  have  set 
one  or  more  of  the  flip  flops  in  the  ladder  circuit.  Assume  that 
there  was  one  F/R  name,  e.g.  the  name  stored  in  the  first  word  of 
AM5-  The  CMU  would  energize  interrogation  line  I to  obtain  the 
F/R  name's  descriptors.  Die  energizing  of  line  I causes  lines 
and  J2  to  set  the  flip  flops  in  the  response  register  of  the 
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Dictionary/Directory  Processor. 
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second  portion  of  the  RAM/AM.  The  CMU  would  then  read  into  its 
memory  the  contents  of  those  words  in  the  RAM/AM  and  AM3  whose 
response  register  is  set.  This  would  provide  the  same  data  as 
shown  above  in  discussing  Pig.  VI-6. 

SUMMARY 

The  contents  of  this  chapter  has  provided  a description  of 
a dictionary/directory  processor  (DDP).  The  description  provided 
was  in  two  levels  of  detail.  One  was  generic  with  illustrations 
from  the  example  shown  in  Pig.  V-l.  The  other  description  was 
detailed  and  emphasized  the  gate  and  flip  flop  level.  Both 
levels  were  discussed  in  relation  to  six  defined  functions  that 
are  performed  by  DBMSs. 

The  next  chapter  contains  a presentation  of  how  these 
functions  can  be  utilized  by  describing  four  major  jobs  that 
can  be  submitted  to  a DBMS.  These  jobs  will  then  be  used  to 
develop  a method  for  evaluating  the  DDP. 


Chapter  VII 

DICTIONARY/DIRECTORY  PROCESSOR  EVALUATION 


INTRODUCTION 

It  is  customary  to  develop  a method  for  evaluating  a new 
approach  to  performing  an  existent  job.  This  situation  exists 
with  the  DDP.  The  contents  of  this  chapter  present  a method  for 
evaluating  the  DDP  and  the  results  obtained  from  its  evaluation. 

In  reality  it  is  very  difficult  to  be  able  to  determine  the 
DDP's  value;  but  a comparison  to  a total  software  implementation 
on  a sequential  computer  does  provide  an  indication  of  its 
value.  The  criteria  chosen  for  this  evaluation  is  the  comparison 
of  time  for  the  DDP  and  a sequential  computer  to  execute  the  same 
jobs.  A total  software  implementation  on  a sequential  computer 
was  chosen  because  this  is  the  current  method  in  which  a DBMS 
performs  its  functions  on  its  data  dictionary  and  data  directory. 

The  sequential  computer  chosen  for  this  evaluation  of  the 
DDP  is  a fictitious  machine  called  the  MIX  computer.  This  machine 
has  been  developed  and  described  by  Knuth  [24].  MIX  was  chosen 
because  it  is  not  an  existing  machine  but  it  is  like  so  many 
existing  machines.  Knuth  states  that  [24],  "MIX  is  very  much  like 
nearly  every  computer  now  in  existence,  except  that  it  is,  perhaps 
nicer.  The  language  of  MIX  has  been  designed  to  be  powerful 
enough  to  allow  brief  programs  to  be  written  for  most  algorithms, 
yet  simple  enough  so  that  its  operations  are  easily  learned."  It 
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is  this  machine  and  its  assembly  language  (MIXAL)  that  were  used 
to  derive  the  timings  for  comparison  with  the  DDP. 

Hie  timings  for  the  DDP  and  its  interfacing  computer  were 
obtained  by  using  times  to  perform  "micro- functions"  on  an 
existing  AP.  Hie  existing  AP  that  was  chosen  was  the  STARAN  [l8]. 
Hie  major  reasons  for  utilizing  an  existing  machine  for  these 
timings  was  to: 

1)  show  the  DDP's  feasibility;  and 

2)  observe  "realistic"  timings  for  a non-existent 
piece  of  hardware. 

Hie  choice  of  this  machine,  however,  not  only  affects  the  timings 
for  the  DDP's  functions,  but  also  its  word  or  field  size,  data 
transmission  time,  and  degree  of  parallelism.  It  should  also  be 
noted  that  the  DDP  does  not  require,  nor  does  it  utilize,  all  of 
the  STARAN's  capabilities,  e.g.  its  arithmetic  capability. 

Hie  timings  for  the  DDP  are  also  influenced  by  assuming 
that  the  RAM/AM  is  an  AM  as  shown  in  the  many  figures  in  Chapter 
VI.  Hie  timings,  however,  will  not  vary  much  if  the  RAM/AM  is 
modeled  as  a RAM;  because  in  the  timing  equations  no  searches 
are  performed  on  the  RAM/ AM.  Data  are  only  retrieved  from  the 
RAM/AM.  Hie  difference  of  whether  the  RAM/AM  is  a RAM  or  an  AM 
■ay  be  important  when  considering  cost,  maintenance,  etc.  If  it 
were  implemented  as  a RAM  then  the  first  portion  of  the  RAM/AM 
and  the  second  portion  could  be  separated  as  two  RAMs  under  the 
control  of  the  central  memory  unit  (CML7). 
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The  vehicle  utilized  for  comparing  the  two  systems  consists 
of  four  job  types.  These  job  types  are  composed  of  various  sub- 
functions. Timing  equations  are  developed  for  the  subfunctions 
and  added  together  to  obtain  timing  equations  for  the  jobs.  With 
these  timing  equations  the  two  systems  can  then  be  compared. 

The  contents  of  this  chapter  is  concerned  with  the 
following; 

1)  the  development  of  sub functions  and  their  timing 
equations; 

2)  the  application  of  these  subfunctions  and  timing 
equations  to  the  development  of  overall  timing 
equations  for  the  four  job  types  utilizing  the 
DDP; 

3 ) the  search  technique  and  the  data  structures  chosen 
for  a sequential  computer's  data  dictionary  and 
partied,  directory; 

4)  the  development  of  the  timing  equations,  using 
MIXAL,  to  perform  each  of  the  four  job  types  on  the 
sequential  computer;  and 

5)  the  comparison  of  the  results  obtained  for  the  DDP 
and  the  sequential  computer. 

PI C TIONAR Y/DIREC TORY  PROCESSOR  MACRO  TIMING  EQUATIONS 

The  timing  equations  to  be  developed  for  the  jobs  will  be 
for  the  time  increment  beginning  when  the  sequential  computer's 
CPU  notifies  the  DDP  control  of  a job  to  be  executed  until  the 
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DDP  supplies  the  TUMA  with  its  results.  The  same  timing 
increment  shall  be  used  for  the  MIX  computer.  In  this  case  the 
total  data  dictionary  and  data  directory  are  assumed  to  be  stored 
in  main  memory.  Thus,  this  will  not  require  any  accessing  of 
peripheral  devices  to  obtain  data.  However,  it  will  require 
time  to  move  data  within  main  memory  from  the  data  dictionary 
and  data  directory  to  the  user's  memory  area  (UMA). 

The  parameters  used  to  describe  the  above-mentioned.  Jobs, 
the  data  dictionary  and  the  data  directory  are  as  follows; 

1)  P,  the  number  of  computer  wol'cts  per  F/R  name; 

2)  P^,  the  number  of  computer  words  per  attribute 
name; 

3)  n^,  the  number  of  f/Rs  related  to  a given  f/R  and 
supported  by  the  DBMS; 

4)  rig,  the  number  of  attributes  in  a given  F/R; 

5)  n^,  the  number  of  associated  F/Rs  of  a given 

attribute; 

6)  n^,  the  number  of  associated  F/Rs  of  a given 
number  (n^)  of  attributes; 

7)  n^,  the  number  of  attributes  specified  in  Job  2; 

8)  ng,  the  number  of  synonyms  of  a given  attribute;  and 

9)  TA,  the  total  number  of  attributes  in  the  data  base. 

These  parameters  will  be  varied  in  the  evaluation  presented  at  the 
end  of  this  chapter.  But,  more  importantly,  they  will  be  used  in 
the  body  of  this  chapter  to  develop  the  timing  equations  for  the 


evaluation. 
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In  the  description  of  the  DDP  in  Chapter  VI,  six  functions 
mere  defined  and  utilized.  These  functions,  need  to  be  analyzed 
and  explained  at  a more  detailed  level  so  that  timing  equations 
can  be  developed.  This  level  can  be  thought  of  as  a "micro- 
function"  level.  Consider  the  following  instructions  as  micro- 
functions and  their  execution  timings  obtained  from  Jl8]  and  a 
STARAN  unpublished  source; 


MICRO- FUNCTIONS 

TIME  IN  u- SECONDS 

ENR 

(energize  a line) 

tENR  = *13 

FRF 

(find  and  reset  first  response) 

W 3 -18 

LCA 

(load  comparand  from  an  AM) 

tLCA  = ‘6k 

CLR 

(clear  register  (Set  = 0)) 

3 -13 

SCM 

(store  comparand  in  control 

memory  unit  (CMU)) 

tSCM  3 '7 

TRN 

(transfer  data  to  and  from 

the  sequential  computer  and 

the  CMU)  t^jj  = .3/word  (word  length 

< 32  bits) 

FRA  (find  first  and  reset  all  other 

responders)  t_.  = .8 

FRA 

LCM  (load  comparand  from  the  CMU)  tT  =>  .7 

XXY  (boolean  AMD  operation  on 


response  registers  X and  Y 


I 

I 

I 

I 


MICRO- FUNCTIONS 


TIME  IN  u- SECONDS 


YXY  (boolean  AND  operation  on 


with  its  result  stored  in  Y) 
EQC  (equal  to  search  -on  AM  with 
the  comparand's  contents) 


( .18) (bits/word)  + (.8) 
(•l8)(5* ) + (-6)  = 6.3 8 


These  micro- functions  can  be  put  together  to  form  "macro- functions 


it  should  be 


noted  that  the  AMs  described  earlier  have  only  one  response 


register.  This  register  will  be  referred  to  as  Y.  There  will 


also  exist  another  register,  X (see  STARAN  [l8] ),  to  be  used  in 


equal  to  searches.  The  other  note  is  that  the  MIX  computer  is  31 


bits  long  and  the  maximum  effective  STARAN  field  length  is  32  bits 


To  maintain  compatibility  between  the  two  for  the  following  examples 
the  length  of  a computer  word/field  will  be  31  bits. 


The  rest  of  this  section  of  the  chapter  is  partitioned 


into  six  parts.  Each  provides  a development  of  timing  equations 


composed  of  a series  of  the  above  micro- functions  performed  in  an 


exact  sequence.  The  time  required  for  each  micro- function  in  the 


sequence  is  summed  to  form  a timing  equation.  These  macro- functions 


are  equal  to  or  are  a part  of  one  of  the  six  functions  mentioned 


previously.  The  development  described  in  the  next  section  of 


this  chapter  utilizes  these  timing  equations  to  form  the  test  bed 


for  the  DDP's  evaluation 
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FUNCTION  I 


The  first  function  determines  if,  in  fact,  the  F/R  names 
and  attribute  names  exist  in  the  data  base.  If  they  do,  the  DDP 
provides  their  names  and  their  shortened  names  to  the  DBMS. 

Single  Equal  to  Search  (SETS) 

To  determine  if  an  F/R  name  or  attribute  name  exists,  an 
equal  to  search  (ETS)  must  be  performed  on  the  proper  AM  after 
the  name  in  question  is  sent  to  the  DDP.  The  SETS  macro- function 
and  its  performance  time  for  one  computer  word  is: 


INSTRUCTION 

TIME 

EXPLANATION 

(01) 

TRN 

tTRN 

Transfer  f/R  name  to  the 

DDP's  CMU 

(02) 

LCM 

tLCM 

Loads  comparand  register 

from  the 

CMU  with  the  value  to  be 

searched 

(03) 

CUR  Y 

^CIR 

Sets  response  register  Y 

to  zero 

(oh) 

EQC 

^EQC 

Performs  an  equality  search  on  all 

words  in  the  AM  and  sets  response 
register  Y. 


The  total  time  to  perform  SETS  is 

TSETS  = tTRN  + tI£M  + + SqC" 

Multiple  Word  ETS  (METS) 

If  the  variable  to  be  searched  is  greater  than  one  computer 
word  and  equal  to  or  less  than  eight  computer  words,  the  ETS  macro- 
function, METS,  must  be  performed.  The  micro- functions  for 
METS  are: 


i 


■ 


) 

>■ 


! 
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INSTRUCTION 

TIME 

EXPLANATION 

(01) 

TRN 

^ tTRN 

Transfer  F/R  name  to  the  DDP's  CMU 

(02) 

SET  X 

tSET 

Sets  response  register  X to  1 

(03) 

CIR  Y 

<W)  ^IR 

Sets  the  Y register  equal  to  zero 

. 

(04) 

LCM 

^ tLCM 

Load  the  comparand  register  with 

one  word  from  the  CMU 

(05) 

EQC 

^ ^EQC 

Performs  an  equality  search  on  all 

words  in  the  AM  and  sets  response 

register  Y 

(06) 

XXY 

tXXY 

"ANDS"  registers  X and  Y and  stores 

result  in  X 

The  total  time  to  perform  METS  is 

TMETS  = ^^TKN  + tLCM  + + ("^CIR  + tXXY^  + tSET' 

where  w is  the  number  of  words  to  be  searched.  (The  size  of  the 
variable  to  be  searched  was  limited  to  eight  computer  words  because 
a STARAN  AM  is  256  bits  wide . ) 

Connect  (CONN) 

ETS  is  the  macro- function  that  can  provide  the  first  part 
of  the  first  function.  To  find  an  attribute's  shortened  name, 
given  its  name  or  vice  versa,  requires  the  energizing  of  of 
AMI  or  AM?,  respectively.  Similarly,  to  find  an  f/r's  shortened 

name,  given  its  name,  or  vice  versa,  requires  the  energizing  of 

* . “ 
w is  the  number  of  computer  words  to  be  searched  (i.e.  1 < w < 0). 

A w in  parentheses  indicates  that  that  micro- function  must  be 
performed  for  each  computer  word. 
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1^  of  AM5  or  Ig  of  AM4,  respectively.  The  CONN  macro- function 
and  its  performance  time  for  these  four  possible  connections  is; 

INSTRUCTION  TIME  EXPLANATION 

(01 ) ENR  t energize  a line 

■CjNR 

Hie  total  time  to  perform  CONN  is  TCQNN  = t£NR 

Single  Retrieve  (SRET) 

Hie  final  part  of  the  first  function  must  provide  to  the 
TUMA  the  contents  of  a word(s)  in  an  AM.  This  can  be  achieved 
by  the  retrieve  macro- function^  The  SRET  macro- function  and  its 
performance  time  for  one  AM  computer  word  is; 


( 


! 


INSTRUCTION 

TIME 

EXPLANATION 

(01) 

FRF 

tFRF 

Find  and  reset  first  response 

in  the  AM 

(02) 

LCA 

(w)  'lca 

Load  the  comparand  with  w words 

from  the  AM 

(03) 

SCM 

tSCM 

Store  in  the  CMU  the  w words  that 

are  in  the  comparand 

(04) 

TRN 

1 TRN 

Transfer  the  w words  in  the  CMU 

to  the  sequential  computer 

The  total  time  to  perform  SRET  is 


TSRET  = tFRF  + W*’tLCA  + fcSCM  + ^TRN^ ' 

Multiple  Retrieve  (MRET) 

If  there  is  more  than  one  AM  word  (n)  to  be  retrieved,  then 
a multiple  retrieve  macro- function,  MEET  is  desired.  The  MRET 


1 

! 
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macro- function  and  its  performance  time  for  n AM  words  of  w computer 


words  is: 


INSTRUCTIONS 


(n+1)  t. 


(w)(n)  t. 


(w)(n)  tc 


(w)(n)  t 


EXPLANATION 


Find  and  reset  first  response 
in  the  AM  (this  is  done  n+1 
times ) 

Load  the  comparand  n times 


with  w words  from  the  AM 


Store  in  the  CMU  the  w words 
that  are  in  the  comparand 
(this  is  done  n times) 


Transfer  the  w words  in  the 


CMU  to  the  sequential  computer 
(this  is  done  n times) 


The  total  time  to  perform  MRET  is 


TMRET  = OOOO^lca  + tSCM  + tTRN'1  + (n+1)tFRF’ 
where  the  extra  PRF  instruction  is  necessary  to  determine  when  n 
retrieved  words  are  completed. 


Comparand  Retrieve  (CRET 


Finally,  if  the  word  to  be  retrieved  is  in  the  comparand 
register,  then  the  macro- function,  CRET  is  desired.  The  CRET 
macro- function  and  its  performance  time  for  w computer  words 
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INSTRUCTION 

TIME 

EXPLANATION 

(01) 

CLR 

tcm 

Set  the  Y register  to  zero 

(02) 

SCM 

<W)  "SCM 

Store  in  the  CMU  the  w words 

that  are  in  the  comparand 

(03) 

TRN 

^ tTRN 

Transfer  the  w words  in  the 

CMU  to  the  sequential  computer 

The  total  time  to  perform  CRET  is 

TCRET  = W^SCM  + tTRN'*  + tCW 


FUNCTION  II 

The  second  function  determines  if  an  attribute(s)  has  any 
synonyms.  If  it  does,  then  their  attribute  names  must  be  deter- 
mined depending  on  whether  the  DBMS  supports  Type  I or  Type  II 
synonyms  and  the  type  of  command  making  the  request. 

Synonyms  (SYNO) 

To  determine  if  an  attribute  has  any  Synonyms,  a response 
in  either  AMI  or  AM?  must  exist.  Then  the  SYNO  macro- function 
will  find  the  respondent  attribute's  synonyms.  The  SYNO  macro- 
function and  its  performance  time  is; 


INSTRUCTION 

TIME 

EXPLANATION 

(01) 

ENR 

^ENR 

Energize  I2  of  either  AMI  or  AM2 

(02) 

ENR 

Energize  and  SI2 

The  total  time  to  perform  SYNO  is  Tovw.  = (2)t___. 

SYNO  BNR 


1 


125 


FUNCTION  III 

The  third  function  determines  if  the  attributes  associated 
with  their  respective  F/R  name  in  the  user's  request  are  correct. 


F/R  to  Attribute  (FRTA) 

FRTA  is  a macro- function  that  provides  those  attribute 
shortened  names  that  are  associated  with  a given  f/R  name  already 
found  in  AM5>  The  FRTA  macro- function  and  its  performance  time  is; 


(01) 

(02) 

(05) 


INSTRUCTION 

ENR 

FRF 

LCA 


TIME 

t 


a> 


ENR 


FRF 


JLCA 


(04) 


ENR 


ENR 


EXPLANATION 

Energize  interrogation  line 
I,  of  AM5 

Find  and  reset  the  first 
response  register  of  ARRAY  III 
Load  the  comparand  register 
of  ARRAY  III  from  its  AM  with 
the  contents  of  the  previous 
respondent  word 
Energize  interrogation  line 
1^  of  ARRAY  III 


<®1> 


The  total  time  to  perform  FRTA  is 

TFRTA  = ^^ENR  + tFRF  " M3ir  "LCA’ 

where  TA  = total  number  of  attributes  in  the  data  base  and  |" x"| 
the  least  integer  > x. 


is 


— ; 
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Attribute  to  F/R  (ATER) 

The  ATER  macro- function  provides  the  f/R  name,  given  one  or 
more  attribute  shortened  names.  The  ATER  macro- function  and  its 
performance  time  is; 


INSTRUCTION 

TIME 

EXPLANATION 

(01) 

ENR 

^ENR 

Energize  interrogation  line 

I2  of  AEE 

(02) 

cut  C 

^CIR 

Set  the  comparand  register  of 

ARRAY  III  to  zero 

(03) 

ENR 

^ENR 

Energize  interrogation  line 

I7  of  ARRAY  III 

(OM 

ENR 

tENR 

Energize  interrogation  line 

Ig  to  ARRAY  III 

(05) 

SET  X 

tSET 

Set  the  X register  of  ARR^Y 

III  to  one 

(06) 

EQC 

<ffil>  w 

Perform  an  equal  to  search  on 

ARRAY  III  with  its  comparand's 

contents 

(07) 

XXY 

'fill1 

Perform  a boolean  AND  operation 

on  X and  Y registers  of  ARRAY 

III  and  store  the  results  in 

the  X register 

(08) 

CIR  Y 

<(n1>  ‘cm 

Set  the  Y register  of  ARRAY 

III  to  zero 

(09) 

YXY 

tYXY 

Perform  a boolean  AND  operation 

on  the  X and  Y registers  of 
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INSTRUCTION 


(10)  ENR 

(11)  CIR  Y 


TIME  EXPLANATION 

ARRAY  III  and  store  the  results 
in  the  Y register 

t Energize  interrogation  line 

aTiK 

Ip  Of  ARRAY  III 
5 

t_T_  Set  the  Y register  of  ARRAY 

OIK 

III  to  zero 


The  total  time  to  perform  ATFR  is 

TATFR  = ([jll^EQC  + tSET  + ^^ENR 

+ ^[5ll+  2^CIR  + (fjll^XXY  + tYXY’ 


Single  Synonym  to  F/R  (SSFR) 

The  synonym  to  F/R  (STFR)  macro- function  is  similar  to  ATFR. 
Given  a data  base  with  Either  Type  I or  Type  II  synonyms  and  the 
SYNO  macro- function  has  been  performed,  then  the  STFR  macro- 
function provides  the  name  of  the  F/R  where  the  synonym's 
occurrences  are  stored.  The  SSfR  macro- function  and  its  perfor- 


mance 

time  for  n^  = 1 is; 

INSTRUCTION 

TIME 

EXPLANATION 

(01) 

cut  S 

^CIR 

Set  the  S register  of  ARRAY 

III  to  zero 

(02) 

ENR 

tENR 

Energize  ihterrogation  line 

Ig  of  AIC 

(03) 

cut  C 

tCIR 

Set  the  comparand  register  i 

ARRAY  III  to  zero 


INSTRUCTION 


EXPLANATION 


Energize  interrogation  line 

I9  of  ARRAY  III 

Energize  interrogation  line 


SET  X 


Ig  of  ARRAY  III 

Set  the  X register  of  ARRAY 


III  to  one 


CLR  Y 


CIR  Y 


#1> 


311  EQC 


a> 


311  XXY 


31  I CIR 


The  total  time  to  perform  SSPR  is 


Perform  an  equal  to  search  on 
ARRAY  III  with  its  comparand's 


contents 


Perform  a boolean  AND  operation 
on  the  X and  Y registers  of 


ARRAY  III  and  store  the  results 


in  the  X register 

Set  the  Y register  of  ARRAY 


III  to  zero 


tyxY  Perform  a boolean  AND  operation 
on  the  X and  Y registers  of 


ARRAY  III  and  store  the  results 


in  the  Y register 

Energize  interrogation  line 

Ic  of  ARRAY  III 
5 

Set  the  Y register  of  ARRAY 


III  to  zero 


^ll^EQC  + tSET  + (4)tENR  + ^3l|  + 5 ^CIR  + 


I 


Multiple  Synonym  to  F/R  (MSFR' 


If  there  is  more  than  one  synonym,  i.e.  n^  > 2,  then  the 
STFR  macro- function  MSFR  and  its  performance  time  is: 


INSTRUCTION 


CIU  S 


cm  c 


SET  X 


TIME  EXPLANATION 

tCm  Set  the  s register  of  ARRAY 
III  to  zero 

tg^  Energize  interrogation  line 
I2  of  AM?  and/or  L,  of  AMI 
(ng)  t^,j^  Set  the  comparand  register 
to  ARRAY  III  to  zero 
(n^+l)  tj^  Energize  interrogation  line 
I9  of  ARRAY  III 

(n^+l)  t Find  and  reset  first  response 
in  register  S of  ARRAY  III 
(ng)  tg^  Energize  interrogation  line 


(nj  t 


6'  SET 


Ig  of  ARRAY  III 

Set  the  X register  of  ARRAY 


III  to  one 


t£  Perform  an  equal  to  search  on 
ARRAY  III  with  its  comparand's 


contents 


^n6^ffll^  tXXY  Perform  a AND  operation 


on  the  X and  Y registers  of 


ARRAY  III  and  store  results 


in  the  X register 


~ * . *-■»  v>  , 
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(10) 

(ID 


(12) 

(13) 


INSTRUCTION 
cm  Y 

YXY 


ENR 


cm  Y 


TIME 


CUR 


31 1 

^n6^tYXY; 


EXPLANATION 

Set  the  Y register  of  ARRAY 
III  to  zero 

) Perform  a boolean  AND  operation 
on  the  X and  Y registers  of 
ARRAY  III  and  store  the 
results  in  the  Y register 
(ng)  Energize  interrogation  line 

I5  of  ARRAY  III 

(ng)  tCIJ?  Set  the  Y register  of  ARRAY 
III  to  zero 


The  total  time  to  perform  MSFR  is 


TMSHt  = + + 1)tCIR  + ^^[fll^EQC 

+ ^n6  + ?>Snr  + ^SET  + Mf^XXY 


(n6}tYXY  + <a6  + 1>t 


FRFi 


where  the  extra  ENR  and  fRF  instructions  are  necessary  to  determine 
when  ng  synonyms  are  completed. 


FUNCTION  IV 

If  there  were  more  than  one  F/R  involved  in  a user's 
request  and  there  was  an  implied  relationship  among  them,  then  the 
fourth  function  would  be  to  determine  if,  in  fact,  the  DBMS 
supported  a relationship  among  these  f/Rs. 
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Relationship  (RE LA.) 

The  macro- function  relationship  (REIA)  determines  whether 
or  not  there  exists  a supported  relationship  between  two  or  more 


F/Rs. 

The  REIA  macro- function 

and  its  performance  time  is; 

INSTRUCTION  TIME 

EXPLANATION 

(01) 

tm 

Energize  interrogation  line  L,  of 

AM^  or  interrogation  line  1^  of  AM3 

(02) 

Energize  SI.^  and  SL,  of  ARRAY  IV 

The  total  time  to  perform  REIA  is  = (2 )t. 


ENR' 


FUNCTION  V 

The  fifth  function  provides  the  rest  of  the  descriptors  for 
each  attribute  name  and  synonym  name  involved  in  a requesting 
command. 


Single  Attribute  to  Descriptors  (SATD) 

The  macro- function  SATD  provides  the  descriptors  for  one 
attribute.  The  macro- function  SATD  and  its  performance  time  is; 


INSTRUCTION 

TIME 

EXPIANATION 

(01) 

CIS 

tCLfl 

Set  the  ladder  circuit  of  ARRAY  II 

• 

to  zero 

(02) 

ENR 

tENR 

Energize  the  interrogation  line 

of  AMI  or  A*C 

(03) 

ENR 

tENR 

Energize  the  interrogation  line  I 

of  ARRAY  II 


The  total  time  to  perform  SATD  is  T, 


SATD 


^^ENR  + ^IR* 
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Multiple  Attribute  to  Descriptors  (MATD) 

The  macro- function  MATD  provides  the  descriptors  for  n 
attributes,  where  n is  greater  than  one.  The  macro-function  MATD 
and  its  performance  time  is; 


INSTRUCTION 

TIME 

EXPLANATION 

(01) 

CIR 

^CIR 

Set  the  ladder  circuit  of 

ARRAY  II  to  zero 

(02) 

ENR 

^ENR 

Energize  the  interrogation 

line  Ig  of  AMI  or  AFC 

(05) 

ENR 

^n+1^  ^ENR 

Energize  the  interrogation 

line  I of  ARRAY  II 

(04) 

FRF 

(n+1)  "frf 

Find  and  reset  the  first 

respondent  of  the  ladder 
circuit  of  ARRAY  II 


The  total  time  to  perform  MATD  is 

TMATD  = ^CIR  + ^n+2^ENR  + (n+i)tpRF> 
where  the  extra  ENR  and  FRF  instructions  are  necessary  to 
determine  when  the  n attributes  are  completed. 

Synonym  I to  Descriptors  (SITD) 

The  macro- function  SITD  provides  the  descriptors  for  the 
stored  attribute  of  Type  I synonyms.  The  macro- function  SITD 
and  its  performance  time  is; 

INSTRUCTION  TIME  EXPLANATION 

(01)  CIR  ^CIR  ®et  the  ladder  circuit  of  ARRAY 


I ..  | 


II  to  zero 
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INSTRUCTION 

TIME 

EXPLANATION 

(02) 

ENR 

tENR 

Energize  interrogation  line  1^ 

of  AMI  or  ANE 

(03) 

ENR 

tENR 

Energize  interrogation  lines  SI^ 

and  SIg  of  ARRAY  I 

(04) 

ENR 

^ENR 

Energize  interrogation  line  I of 

ARRAY  II 

(05) 

FRA 

tFRA 

Find  the  first  respondent  in  A»C 

or  AMI  and  reset  all  others 

(06) 

cut 

tCLR 

Set  the  ladder  circuit  of  ARRAY 

11  to  zero 

The  total  time  to  perform  SIT'D  is  TgI<rD  =.  (3  )tENR  + (2  )tCIR  + t^. 


FUNCTION  VI 

The  sixth  function  provides  the  f/r  primary  key  and  pointer 
for  each  f/r  and  related  f/r  involved  in  a query  to  the  data  base. 

I 

The  macro- function  SPTtD  provides  the  descriptors  for  one  F/R 
name.  The  macro- function  SPUD  and  its  performance  time  is; 

INSTRUCTION  TIME  EXPLANATION 

(01)  CIR  t_TD  Set  the  ladder  circuit  of  ARRAY 

CLn 


II  to  zero 

(02)  ENR  t£NR  Energize  interrogation  line  Ig  of 

AM4  or  interrogation  line  I_  of  AM5 


INSTRUCTION 


TIME 


EXPLANATION 


r 


: 

I 


* 
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(03)  ENR  Energize  interrogation  line  I of 

ARRAY  II 

The  total  time  to  perform  SPED  is  TgFRD  = (2)tElffi  + tCIfi. 

t 

Multiple  F/R  to  Descriptors  (MFRD) 

The  macro- function  MFRD  provides  the  descriptors  for  n F/Rs, 
where  n is  greater  than  one.  The  macro- function  MFRD  and  its 
performance  time  is; 

INSTRUCTION  TIME  EXPLANATION 


(01) 

CIR 

fcCLR 

Set  the  ladder  circuit  of 

ARRAY  II  to  zero 

(02) 

ENR 

tENR 

Energize  interrogation  line 

Ig  of  AM4  or  interrogation 
line  1^  of  A^ 

(03) 

ENR 

(n+1^  Snr 

Energize  interrogation  line  I 

of  ARRAY  II 

(04) 

FRF 

^n+l)  tFRF 

Find  and  reset  the  first 

respondent  of  the  ladder 
circuit  of  ARRAY  II 


The  total  time  to  perform  MFRD  is 

^MFRD  * ^n+2^EHR  + (n+1)tjjtp  + tCIa» 
where  the  extra  ENR  and  FRF  instructions  are  necessary  to  determine 
when  the  n F/Rs  are  completed. 

This  completes  the  development  of  the  timing  equations  for 
the  specific  macro- functions  making  up  the  six  functions  discussed 
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in  the  previous  chapter.  The  macro- functions  are  summarized  in 
Table  VII- 1.  The  next  portion  of  this  chapter  contains  the 
development  of  the  timing  equations  for  the  four  generic  jobs 
making  up  the  test  bed  for  the  DDP's  evaluation. 

DICTIONARY/DIRECTORY  PROCESSOR  JOB  TIMING  EQUATIONS 

The  following  are  the  four  generic  jobs  to  be  used  in 
developing  timing  equations  for  the  DDP.  Each  job  is  followed  by 
an  example  referring  to  the  Suppliers  Data  Model  shown  in  Fig. 
V-l. 

1)  Given  the  F/R  name,  the  DBMS  is  required  to  provide 
all  its  occurrences. 

Example;  To  provide  all  the  occurrences  of  the  F/R  S and 


define  it 

as  an 

f/r  w. 

GET 

W(S.S#,S.SNAME, 

S. STATUS, 

S.CITY) 

RESULT 

s# 

SNAME 

STATUS 

CITY 

SI 

SMITH 

20 

IONDON 

S2 

JONES 

10 

PARIS 

S3 

BLAJCE 

30 

PARIS 

Sb 

CLARK 

20 

IONDON 

S5 

ADAMS 

30 

ATHENS 

2)  Given  sin  F/R  name  and  a number  of  its  attribute  names, 
the  DBMS  is  required  to  provide  a subset  of  the 
occurrences  of  the  F/R. 

Example:  To  provide  all  the  occurrences  of  the  F/R  S 
whose  occurrence  of  CITY  is  equal  to  IONDON  and  to  define  the 
results  as  an  F/R  W. 


i 


I 
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Table  VII- 1.  Macro-Function  Timing  Equations  for  the  Dictionary/ 
Directory  Processor. 

FUNCTION  I 


Macro- function : 

SETS 

Explanation; 

Single  Equal  to  Search 

Timing  Equation: 

TSETS  = tTRN  + tLCM  + tCIP  + tEQC 

Macro- funct ion : 

METS 

Explanation: 

Multiple  word  ETS 

Timing  Equation; 

TMETS  = ^^TRN  + tLCM  + 

+ ^^CLR  + tXXY-'  + tSET 

Macr  o- funct ion : 

CONN 

Explanation; 

Connect 

Timing  Equation-. 

TCONN  = tENE 

Macro- function : 

SRET 

Explanation; 

Single  Retrieve 

Timing  Equation; 

TSRET  3 tFRF  + ^w)ttLCA  + tSCM  + tTRN^ 

Macro- function ; 

MRET 

Explanation; 

Multiple  Retrieve 

Timing  Equation: 

W ■ * ‘SCM  * W + 

Macro- function : 

CRET 

Explanation: 

Comparand  Retrieve 

Timing  Equation; 

TCRET  3 (w)[tSCM  + tTRN]  + 
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Table  VII-1  (continued) 


Macro- function; 
Explanation; 
Timing  Equation; 


FUNCTION  II 

SYNO 

Synonyms 
TSYNO  3 ^2^enr 


Macro- function ; 
Explanation: 
Timing  Equation; 

Macro- function ; 
Explanation; 
Timing  Equation; 


Macro- function ; 
Explanation: 
Timing  Equation; 


Macro- function ; 
Explanation: 
Timing  Equation; 


FUNCTION  III 

FRTA 

F/R  to  Attribute 

TFRTA  3 ^ENR  + tFRF  + ^ffll^LCA 
ATFR 

Attribute  to  F/R 

TATFR  = (f  31I  ^EQC  + tSET  + ^^ENR  + ^l^ll  + 2 ^CIR 
+ ^Jll^XXY  * tYXY 

SSFR 

Single  Synonym  to  F/R 

TSSFR  = ^Jll^EQC  + tSET  + ^^ENR  + ^ffll  + 5 ^CIR 
+ ^Jl^XXY  + tYXY 

MS  JR 

Multiple  Synonym  to  F/R 

TMSJR  " ^2  + + 1^tCIR  + ^^ffll^EQC 

+ 5 (“6  + 3^ENR  + ^n6^SET  + 

+ 1^tJRF 


+ <n6NxY+  (“6 
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Table  VII- 1 (continued) 


Macro- f unct ion : 


Explanation; 
Timing  Equation; 


Macro- function ; 
Explanation: 
Timing  Equation; 


Macro- function; 
Explanation; 
Timing  Equation; 


Macro- funct ion ; 
Explanation; 
Timing  Equation; 


Macro- funct ion ; 
Explanation; 
Timing  Equation; 


Macro- function ; 
Explanation; 
Timing  Equation; 


FUNCTION  IV 


Relationship 


TRELA  * (2  }tENR 


FUNCTION  V 


Single  Attribute  to  Descriptors 
TSATD  * (2  ^ENR  + tCIF 


Multiple  Attribute  to  Descriptors 

TMATD  = VlR  + (n  + 2)tENR  + (n  + X)tFRF 

SITD 

Synonym  I to  Descriptors 
TSITD  = ^3)tENR  + v2)tCIR  + tFRk 

FUNCTION  VI 

SFRD 

Single  F/R  to  Descriptors 
TSFRD  * (2  )t;ENR  + tCIR 
MFRD 

Multiple  f/r  to  Descriptors 

TMFRD  " (n  + 2)tENR  + (n  + 1}tFRF  + ^IR 
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GET  w(3.S#,S. SNAME, S. STATUS, S.CITY):S. CITY  = 'LONDON' 


RESULT  W 

s # 

SNAME 

STATUS 

CITY 

SI 

SMITH 

20 

LONDON 

S4 

CLARK 

20 

LONDON 

3)  Given  an  attribute's  occurrence (s ) of  Type  II  synonyms, 
the  DBMS  is  required  to  modify  (e.g.  delete,  change 
value,  etc.)  the  attribute's  and  its  synonym's  occurrences 
in  all  their  associated  F/Rs. 

Example;  To  change  the  occurrence  of  STATUS  to  20  for 
those  occurrences  whose  S#  = S3. 


HOLD 

n " 

W(S.S#,S. STATUS):  S.S# 

W.  STATUS  = '20' 

= 'S3' 

UPDATE 

W 

RESULT 

S S# 

SNAME 

STATUS 

CITY 

SI 

SMITH 

20 

LONDON 

S2 

JONES 

10 

PARIS 

S3 

BLAKE 

20 

PARIS 

S4 

CLARK 

20 

LONDON 

S5 

ADAMS 

30 

ATHENS 

If  the  attribute  STATUS  had  any  synonyms  associated  with  other 
F/Rs,  then  the  DBMS  would  have  to  change  all  their  occurrences 
whose  S#  equaled  S3,  or  an  equivalent  unique  identifier  to  the 
S3  occurrence  of  F/R  S. 

1+)  Given  an  attribute's  occurrence(s ) of  Type  I synonyms, 
the  DBMS  is  required  to  modify  (e.g.  delete,  change 
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value,  etc.)  the  attribute's  stored  synonym  occurrence(s ) 
in  its  associated  F/R. 

Example:  To  change  the  occurrence  of  an  attribute  name  STAT 
to  20.0  in  the  F/R  K for  those  occurrences  whose  S#  » S3.  The 
attribute  STAT  has  its  values  stored  with  its  synonym,  STATUS,  in 
F/R  S. 

HD  ID  W(K.S#,K.STAT);  K S#  - 'S3' 

W.  STATUS  - '20.0' 

UPDATE  W 

These  three  statements  would  be  given  by  the  user.  The  DBMS  would 
not  change  the  value  STAT  in  F/R  K but  would  internally  generate 
equivalent  statements  as  shown  in  the  example  for  Job  3 and  provide 
the  following: 


RESULT 

S S# 

SNAME 

STATUS 

CITY 

SI 

SMITH 

20 

LONDON 

S2 

JONES 

10 

PARIS 

S3 

BLAKE 

20 

PARIS 

S4 

CLARK 

20 

IONDON 

S5 

ADAMS 

30 

ATHENS 

JOB  1 

The  first  generic  Job  is  for  the  DBMS  to  provide  all  the 
occurrences  of  an  F/R  given  only  the  F/R  name.  To  develop  the 
timing  equation  for  Job  1,  consider  Fig.  VII-1  and  Table  VII-2. 

The  process  for  Job  1 is  to  first  find  the  F/R  name  and  all 

(°2 ) of  ^tribute  names,  shortened  names  and  their  descriptors. 


1 

( 

I 


i 


r 


lk2 


Table  VII-2.  Macro- Function  Timing  Equations  for  Job  1 and  the 
Dictionary/Directory  Processor. 


SYNONYM  I 


STEP  MACRO- FUNCTION 

1 METS 


FRTA 


CONN 


MAT'D 


SRET 


CONN 


RELA 


TIMING  EQUATION 


TMETS  = P^TRN  + tLCM  + tEQC^ 


+ (P  + l)[tCI£  + tm]  + tgET 


TFRTA  = ^2^ENR  + tFRF  + ^l^LCA 


TCONN  " tENR 


TMATD  - ‘cm  + (V^ENR  + (n2+1)t 


TSRET  3 n2tFRF  + ^^^l^^LCA  + t| 
+ tTRN^  + 7n2^FRF  + tLCA 


FRF 


SCM 


+ tSCM  + tTRN-' 


TCONN  3 tENR 


trela  3 (2^: 


ENR 


M"D  W " ^ V2  ^ENR  + (V^WF  + tCIR 


. 


t.  . . 


I 


1^5 


Table  VII-2  (continued) 


STEP  MACRO-  FUNCTION 
9 MRET 


10 


CRET 


METS 


CONN 


MATD 


TIMING  EQUATION 

IMRET  = ^“l^LCA  + tSCM  + tTRN^ 


+ (3^+1 


TCRET  “ P*'tTRN  + tSCM‘*  + ^IR 


SYNONYM  II 


TMETS  = P^TRN  + tLCM  + SqC^ 


+ (P+l)[tcu?  + t^]  + ts£T 


raTA  TFRTA  * ^^ENR  + tFRF  + ^ll^LCA 


TCONN  = tENR 


TMATD  * tCIR  + ^V^ENR  + ^n2+1^t|RF 


TSRET  " V»F  + ^V^LCA  + 1 


SCM 


+ tTRN-'  + 7n2^raF  + tLCA 


+ tSCM  + tTRN^ 


1 


6 


CONN 


T * t 
CONN  ENR 
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Table  VII-2  (continued) 

STEP  MACRO-  FUNCTION 

7 SFRD 

8 SRET 

9 CRET 


1 

*»  • 

1 


j 


TIMING  EQUATION 
TSraD  “ ^ }tENR  + tCIR 


TSRET  " ^^FRF  + ^LCA  + tSCM  + t‘IRN^ 


TCRET  * P*'tTRN  + tSCM-'  + tCLR 
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Once  they  are  found  they  are  retrieved.  Then  if  the  data  base 
has  Type  i synonyms,  all  (n^)  of  the  F/R's  shortened  names  and 
descriptors  that  are  related  to  the  given  F/R  are  required. 

This  is  necessary  because  to  obtain  the  occurrences  of  some 
attributes  in  the  given  f/R,  its  related  f/Rs  may  have  to  be 
accessed.  The  process  must  then  multiply  retrieve  all  the  data 
for  the  related  F/Rs  and  the  given  F/R  name.  If  the  data  base  has 
Type  II  synonyms  then  only  the  descriptors  of  the  given  f/R 
need  to  be  found  and  retrieved. 

The  first  six  steps  shown  in  Fig.  VII-1  and  Table  VII-2 
for  Job  1 are  identical  for  synonyms  Type  j and  II • These  are 
as  follows; 

Step  1)  A multiple  (P)  word  equal  to  search  is  performed 
to  find  the  given  F/R  name  in  AM3; 

Step  2)  Find  all  the  attribute's  shortened  names  in 

At£  that  are  associated  with  the  f/R  name  in  AM5; 

Step  5)  Find  all  the  attribute  names  in  AMI  of  all  the 
attribute  shortened  names  found  in  AM2; 

Step  4)  For  each  attribute  (n^),  one  at  a time  find 
their  descriptors  in  the  RAM/ AM; 

Step  5)  Before  finding  the  next  attribute's  descriptors 
(Step  4 again),  retrieve  the  attribute's  name, 
shortened  name,  and  its  six  descriptors; 

Step  6)  To  find  the  given  F/R's  shortened  name  requires 
the  connection  of  AM3  and  AM4. 
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If  the  DBMS  has  Type  I synonyms,  then  the  following  steps 
are  performed; 

Step  7)  Find  all  the  f/R  names  and  shortened  names  in 
AT®  and  AM4  that  are  related  to  the  F/R  name 
found  in  AM5  by  Step  1; 

Step  8).  For  each  (n^)  F/R,  find  its  descriptors; 

Step  9)  Before  finding  the  next  f/r's  descriptors 

(Step  8 again),  retrieve  the  F/R's  shortened 
name  and  its  descriptors; 

Step  10)  Before  ending  the  job,  the  given  F/R  name 

stored  in  the  comparand  register  of  AM5  must 
be  retrieved. 

If  the  DBMS  has  Type  II  synonyms,  then  the  following  steps 
are  performed; 

Step  7)  For  the  given  F/R  name,  find  all  its 
descriptors; 

Step  8)  For  the  given  F/R  name,  retrieve  all  its 
descriptors  and  shortened  name; 

Step  9)  Before  ending  the  job,  the  given  f/R  name 

stored  in  the  comparand  register  of  AM5  must 
be  retrieved. 

To  obtain  the  equation  T^j,  which  expresses  the  total  time 
to  perform  Job  1 for  Type  I synonyms,  the  times  for  each  of  the 
macro- functions  must  be  summed.  This  is  accomplished  by  summing 
the  timing  equations  shown  in  Table  VII-2  for  synonym  I. 
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T1I  * "kFTS  + TFRTA  + TCONN  + TMATD  + TSRET  + TCONN 

+ treia  + ^kniD  + tmret  + tcret 

T1I  = ^P^LCM  + ^P^EQC  + *-P+1'*tXXY  + tSET 

+ + ^tt2  + + tn2(p1+''’)  + [jj]  + ^i^lca 

+ [Pf4]tCI*  + tn2(Pl+7)  + 5nl  + P]tSCM 

+ £2P  + + ^i^TRN  + [nL  + 

To  obtain  the  equation  for  T^^,  the  total  time  to  perform 
Job  1 for  Type  II  synonyms,  the  times  for  each  of  the  macro- 
functions must  be  summed.  This  is  accomplished  by  summing  the 
timing  equations  shown  in  Table  VII-2  for  synonym  II. 

T1II  = TMETS  + tprta  + tconn  + tmatd  + tsret  + tconn 
+ tsfrd  + tsret  + tcret 

T1II  ■ ^P^LCM  + ^P^EQC  + ^Pfl’ltXXY  + tSET  + t9n2  + ^FRF 

♦ tJ  ♦ [f£]  * «fe(?1+7)huai  * i?  * Mtcia 

♦ MV”  ♦ 3 + Pjt^  + Up  ♦ n^+T)  * Jli,, 

♦ tn2  * WW 

JOB  2 ^ 

The  second  job  is  for  the  DBMS  to  provide  a subset  of  the 
occurrences  of  an  F/R  given  the  F/R  name  and  some  operation  on 
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a number  (n^)  of  its  attributes.  The  development  of  the  timing 
equation  for  this  job  is  similar  to  the  development  for  the  first 
job.  Consider  Fig.  VII-2  and  Table  VII-3-  The  first  operation 
is  to  perform  n^  ETS's  for  each  of  the  attribute  names  given  to 
the  DBMS.  This  is  followed  by  a determination  of  each  of  their 
shortened  names  and  the  F/R  names  (n^)  with  which  all  n^_  attributes 
are  associated.  The  next  operation  is  to  determine  the  descriptors 
of  the  n^  attributes  and  to  retrieve  the  attribute  names, 
shortened  names,  and  their  descriptors.  The  DDF  then  retrieves 
all  (n^)  the  f/R  names  that  contain  the  n^  attributes.  If 
one  of  these  names  is  the  F/R  name  given  in  the  query  to  the  DBMS, 
then  the  process  continues  by  performing  an  ETS  on  the  F/R  name. 

The  rest  of  Job  2 proceeds  identically  to  Job  1 (i.e.  from  Job  1, 
step  6 onward). 

Hie  first  eight  steps  shown  in  Fig.  VII-2  and  Table  VII-3 
for  Job  2 are  identical  for  synonyms  Type  i and  II.  These  are 
as  follows; 

Step  1)  (n*. ) multiple  (P^)  equal  to  searches  are  performed 

to  find  the  given  attribute  names  in  AMI; 

Step  2)  Find  all  the  given  (n^ ) attribute  names'  shortened 
names  by  interacting  AMI  to  Ah6; 

Step  3)  Find  the  F/R(s)  (n^)  in  which  all  the  given  n^ 
attributes  are  associated  by  interacting  AhC 
to  AM5  using  ARRAY  III; 

Step  4)  For  each  attribute  (n^),  one  at  a time,  find 
their  six  descriptors  in  the  RAM/AM; 


Table  VII- 3 • Macro- Function  Timing  Equations  for  Job  2 and  the 
Dictionary/Directory  Processor. 


SYNONYM  I 


TIMING  EQUATION 


MACRO- FUNCTION 
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Table  VII-3  (continued) 


STEP  MACRO- EDUCTION 


TIMING  EQUATION 


TMETS  = ^^TRN  + tLCM  + ^QC^ 

+ (Pfl)[tCLR  + t^]  + tgET 


TCONN  = tENR 


trela  3 (2^enr 


TMERD  3 ^nl+2^ENR  + ^nl+1^ERF  + ^IR 


TMRET  3 ^)(nl)[tLCA  + tSCM  + tTRN‘* 

+ (3nL  + 


TCRET  3 P[tTRN  + tSCM‘I  + tCIR 


SYNONYM  II 


TMETS  = ^n5^Pl^[tTRN  + tLCM  + 

+ (-jKPi+D^chi  + W 


+ ^n5^SET 


TCONN  3 tENR 


1 


Table  VII-3  (continued) 


STEP  MACRO- FUNCTION  TIMING  EQUATION 


TATFR  ' + ‘set  + (ll)tE«R 

♦ <lfrl  +2>‘ci*  * 


TMATD  ^CLR  + ^n5+2^ENR  + ^n5+1^FRF 


TSRET  = ^n5^FRF+  (n5)(Pi)[tLCA 


+ tSCM  + ^TRN^  + (^(^^FRF 


+ tLCA  + tSCM  + tTRN-' 


TMRET  ~ (p)(nlt)ttLCA  + tSCM  + tTRN-' 
+ (n4  + 1 ^FRF 

TMETS  = (P)ttTRN  + tLCM  + ^QC^ 

+ (Pfl)[tCI^  + t^]  + tgET 


TCO NN  = tENR 


TSraD  = ^^ENR  + tCLR 


-V»  ..-v  • 
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Step  5)  Before  finding  the  next  attribute's  descriptors 
(Step  4 again),  retrieve  the  attribute's  name, 
shortened  name  and  its  six  descriptors; 

Step  6)  Before  proceeding — the  names  of  the  (n^)  f/Rs 
need  to  be  retrieved  from  AM5*  If  one  of  them 
is  equivalent  to  the  given  f/R  name,  then  the 
job  will  continue; 

Step  7)  A multiple  (P)  word  equal  to  search  is  performed 
to  find  the  given  F/R  name  in  AM5; 

Step  8)  To  find  the  given  F/R's  shortened  name  requires 
the  interaction  of  AM5  to  AM4. 

If  the  DBMS  has  Type  I synonyms,  then  the  following  are 
performed; 

Step  9)  Find  all  the  F/R  names  and  shortened  names  in 
A M5  and  AM4  that  are  related  to  the  F/R  name 
found  in  AM5  by  step  7; 

Step  10 ) For  each  (n^  F/R,  find  its  descriptors; 

Step  11)  Before  finding  the  next  F/R's  descriptors 

(Step  10  again),  retrieve  the  F/R's  shortened 
name  and  its  descriptors; 

Step  12)  Before  ending  the  job,  the  given  F/R  name 

stored  in  the  comparand  register  of  AM5  must  be 
retrieved. 

If  the  DBMS  has  Type  II  synonyms,  then  the  following  steps 


are  performed; 
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Step  9)  For  the  given  F/R  name,  find  all  its  descriptors; 

Step  10)  For  the  given  F/R  name,  retrieve  all  its  descriptors 

and  shortened  name; 

Step  11)  Before  ending  the  job,  the  given  F/R  name 

stored  in  the  conqjprand  register  of  AM3  must  be 
retrieved. 

To  obtain  the  equation  which  expresses  the  total  _time 
to  perform  Job  2 for  Type  I synonyms,  the  times  for  each  of  the 
macro- functions  must  be  summed.  This  is  accomplished  by  summing 
the  timing  equations  shown  in  Table  VII-3  for  synonym  I. 

= TMETS  + TC0NN  + tatfr  + TMATD  + TSRET  + tmret 

+ ’nets  + tconn  + trela  + tmfrd  + tmret  + tcret 

T2I  [n5Pl+P-|tLCM  + + n5Pl+P]tEQC 

+ [n5(PL+l)  + J”  — | + P + l]tm  + [n^  + 2]tgET 

+ [9n5  + n4  + 4^  + 4^^  + [^(Pj+7)  + P + 3*^^ 

. [njtp^D  ♦ [241  ♦ p * M‘eia 

* In5<F1*T)  + * 3^  + 

+ [^(SPj+T)  + Pn^  + 3nL  + 2p]tTRN  + tYya 

* tn5  ♦ *1  + “]W 

To  obtain  the  equation  jj,  the  total  time  to  perform  Job  2 
for  Type  synonyms,  the  times  for  each  of  the  macro- functions 
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must  be  summed.  This  is  accomplished  by  summing  the  timing 
equations  shown  in  Table  VI 1-3.  for  synonym  II. 

T2II  = TMETS  + TCONN  + TATra  + TMATD  + TSRET  + tmret 

+ tmets  + tconn  + tsfrd  + tsret  + tcret 
Tgn  = tn5Pi+PltLCM  + tfjll  + n5Pl  + P^EQC 

* * [fl]  + P * l]tXXY  + [n5  + 2]tSEI 

+ [9n^  + + 5]tpRF  + [^(P^T)  + Pn^  + 3]tLCA 

* * f * 6 + [fflJtcuj 

* . P<V1)  . 3'Jt^ 

* [^(SP^T)  ♦ P(V2>  ♦ 3]^  * tm 

* ln5  ' “''am- 


JOB  3 

The  third  generic  job  for  the  DBMS  is  to  modify  all  the 
occurrences  of  an  attribute  in  a data  base  with  Type  II  synonyms. 
This  requires  the  DBMS  to  modify  all  the  F/Rs  which  have  an 
association  with  either  the  attribute  in  question  or  any  of  its 
synonyms.  The  process  is  shown  in  Fig.  VII-3  and  Table  VII-4. 

The  process  assumes  the  attribute  shortened  name  has  been 
previously  determined,  therefore  the  first  step  is  to  perform  an 
ETS  on  the  attribute's  shortened  name.  The  next  step  is  to 


Table  VII-4.  Macro- Function  Timing  Equations  for  Job  J and  the 
Dictionary/Directory  Processor. 


STEP 

MACRO- FUNCTION 

TIMING  EQUATION 

1 

SETS 

TSETS  " tTRN  + tLCM  + tCIR  + tEQC 

2 

SYNO 

tsyno  3 (2^enr 

3 

MATD 

'*MATD  = tCIR  + ^V2^ENR  + ^6+1)tFRF 

4 

SRET 

TSRET  = ^7) (“6 ) CtFRF  + tLCA  + tSCM  + tTRN^ 

5 

MSFR 

TMSFR  3 + ^51^  ^n6  ^ + 1^tCIR 

+ ^6  XggT  + ^^xfjil^xXY 


* Wm  * <V1,tiKF 
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determine  all  (n^ ) of  its  synonyms  and  retrieve  all  of  their 
shortened  names  and  descriptors.  Then  each  attribute  shortened 
name  is  used  to  find  its  associated  (n^)  F/Rs.  Ifoen  the  f/r 
shortened  names  are  retrieved  with  their  descriptors.  This  is 
followed  by  retrieving  all  n^  f/R  shortened  names,  their 
descriptors,  and  the  clearing  of  AM5  so  that  the  next  synonym 
attribute  can  be  processed. 

A detailed  description  of  this  job  can  be  obtained  from 
the  following  steps  describing  Fig.  VII-3  and  Table  VII-U; 

Step  1)  Perform  an  equal  to  search  on  the  given 
attribute's  shortened  name  using  AJC; 

Step  2)  Determine  the  given  attribute's  synonyms  by 

utilizing  ARRAY  I and  AMI.  This  will  provide 
all  the  attribute's  names  and  shortened  names 
of  its  synonyms; 

Step  3)  For  each  attribute  (n^ ),  one  at  a time,  find 
their  six  descriptors  in  the  RAM/AM; 

Step  4)  Before  finding  the  next  attribute's  descriptors 

(Step  3 again),  retrieve  the  attribute's  shortened 
name  and  its  six  descriptors; 

Step  5)  For  each  synonym  attribute  name  and  shortened 
name  (n^),  one  at  a time,  find  its  associated 
F/R  names  (n^)  by  using  ARRAY  III  to  associate 
AMI  and  AM?  with  AM3> 


Step  6)  Before  performing  Step  5 again,  the  resultant 
F/R  names  from  Step  5 are  to  be  interacted  to 
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their  shortened  names  in  AM4  and  then  Step  7 
v would  be  performed; 

Step  7)  For  each  F/R  (n^),  one  at  a time,  find  their 
two  descriptors  in  the  RAM/AM; 

Step  8)  Before  finding  the  next  f/R's  descriptors 

'(Step  7 again),  retrieve  the  F/R's  shortened 
name  and  its  two  descriptors.  If  all  n^  F/R's 
are  complete  go  to  Step  9; 

Step  9)  Clear  the  response  register  of  AM3  and  proceed 

to  Step  5*  If  all  attributes  have  been  process- 
ed, the  job  is  ended. 

To  obtain  the  equation  T^,  the  total  time  to  perform 
Job  ^ for  Type  ^1  synonyms,  the  time  for  each  of  the  macro- 
functions, must  be  summed.  This  is  accomplished  by  summing  the 
timing  equations  shown  in  Table  VII-4. 

T5II  a TSETS  + TSYNO  + TMATD  + TSRET  + TMSFR  + TCONN 

+ TMFRD  + TSRET  + ^n6  ^ TCLR‘ 

T5II  " tIiCM  + [n6^[fll)  + 1]tEQC  + tn6^ffll):,tXXY  + ^^SET 

+ tn6(10+S)  + 2]tFRF  + [n6(7+n3)]tLCA 

♦ ln6<U  * [fll>  + 31‘om  * tn6<™-5»‘SCM 

* * HtraN  + C"6Hm  + [n6("5*7)  + 6)1^ 


JOB  4 


The  fourth,  and  last,  generic  job  is  to  modify  all  the 
occurrences  of  an  attribute  in  a data  base  with  Type  i synonyms. 
This  requires  the  DBMS  to  modify  all  the  F/Rs  which  have  an 
association  with  the  attribute  in  question's  stored  synonym. 

The  process  is  shown  in  Fig.  VII-4  and  Table  VII-5-  As  discussed 
in  Job  5 above,  the  attribute's  shortened  name  is  used  in  an  ETS. 
This  is  followed  by  locating  the  stored  synonym's  descriptors. 

Then  the  first  attribute  shortened  name  (i.e.  the  stored 
synonym's  shortened  name)  is  used  to  determine  all  associated 
(n^)  F/R  names.  Then  the  process  finds  and  retrieves  all  (n^) 
of  the  F/R's  shortened  names  and  their  descriptors.  The  last 
step  retrieves  the  stored  attribute’s  shortened  name,  its 
descriptors,  and  clears  AMl's  response  register. 

A detailed  description  of  this  job  can  be  obtained  from 
the  following  steps  describing  Fig.  VI I- 1 and  Table  VII-5: 

Step  1)  Perform  an  equal  to  search  on  the  given  attribute's 
shortened  name  using  Alfi; 

Step  2)  Determine  the  given  attribute's  stored  synonym 
and  its  descriptors; 

Step  3)  From  Step  2 only  the  stored  synonym  is  respondent 
in  AJC,  therefore  find  all  (n^)  the  f/R's  in 
which  it  is  associated  by  using  ARRAY  III  to  inter- 
act with  AM5> 

Step  k ) Find  all  (n^ ) of  the  shortened  names  of  the  F/Rs 
found  in  Step  3 hy  interacting  AM3  to  AM4; 


* *•(  * • » . 
i,  « - • »•  * . ' , ■ 
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Table  VII-5-  Macro- Function  Timing  Equations  for  Job  4 and  the 
Dictionary/Directory  Processor. 


STEP  MACRO- FUNCTION  TIMING  EQUATION 

1 SETS  TSETS  = tTRN  + tLCM  + tCLR  + tEQC 


TSITD  " (^ENR  + ^2^CIR  + tFRA 


djll^EQC  + tSET  + ^^ENR 

+ (fill  * 2 ^cift  * ([ffl^xXY  + ‘m 


TCONN  3 tENR 


I 


TMFRD  = (V^ENR  + ^n3+1^tFRF  + tCIR 


TSRET  " ^^^FTIF  + tLCA  + tSCM  + ^TRN^ 


TSRET  " ^^^FRF  + tLCA  + tSCM  + tTRN‘* 


TCIR  " tCIR 


" » . , 
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Step  5)  For  each  f/R  name  and  shortened  name,  one  at 
a time,  find  their  two  descriptors  by  inter- 
acting ARRAY  IV  to  the  RAM/AM; 

Step  6)  Before  finding  the  next  f/r's  descriptors 

(Step  5 again),  retrieve  the  f/r's  shortened 
name  and  its  two  descriptors; 

Step  7)  Retrieve  the  stored  synonym's  attribute 

shortened  name  and  its  six  descriptors  from 
AJC  and  the  RAM/AM,  respectively; 

Step  8)  Clear  the  response  register  of  AMI. 

To  obtain  the  equation  T4l’  the  total  time  to  perform 
Job  4 for  Type  I synonyms,  the  time  for  each  of  the  macro- 
functions must  be  summed.  This  is  accomplished  by  summing 
the  timing  equations  shown  in  Table  VII-5 • 

TUl  = TSETS  + TSITD  + TATFR  + TCONN  + tmfrd 
+ TSRET  + TSRET  + TCLR 

T4l  3 tLCM  + ^fjll  + 1'*tEQC  + '•[fiI^XXY  + tSET 

* 'S  * 8ItFSF  * [5”3  * 7!tLCA  + [ffll  + 7ltCU, 

+ [5nj  + 7]tgCM  + C5n^  + 83tTRN  + + [n5  + ^^ENR 


+ t 


FRA* 


This  concludes  the  development  of  the  timing  equations  for 


tv’ 


ff  .i 

1 

‘ ( 

I 1 


the  DDP.  These  equations  are  general  and  can  be  evaluated  for 
different  hardware  capabilities,  i.e.  transfer  rate,  degree  of 
parallelism,  etc.  The  actual  timings  appearing  at  the  end  of 


165 


this  chapter  are  for  those  timings  of  the  micro- functions  that 
were  obtained  from  the  STARAN.  If  different  timings  are  avail- 
able then  they  could  be  used  in  the  above  derived  equations. 


SEQUENTIAL  COMPUTER  IMPLEMENTATION 

The  next  portion  of  this  chapter  describes  the  development 
of  the  mathematical  equations  that  can  be  used  to  determine  the 
time  to  perform  the  same  generic  jobs  on  a sequential  machine. 

Like  the  DDP  discussed  above,  the  MIX  computer's  software 
directs  two  basic  tasks  in  the  performance  of  the  generic  jobs. 

The  first  task  is  to  perform  an  equal  to  search  (ETS)  and  the 
second  task  is  to  transfer  the  data  found  with  the  "successful" 

ETS  to  the  user's  area  in  the  main  memory  of  the  MIX  computer. 

To  perform  the  first  task,  it  was  necessary  to  develop 
a fast,  efficient,  and  realistic  ETS  technique.  Two  techniques 
were  studied.  These  were  hashing  and  explicit  binary  tree 
searching.  Appendix  B provides  a discussion  of  both  techniques 
and  presents  the  data  showing  how  much  faster  a hashing  technique 
is  than  an  explicit  binary  search  technique.  A hashing 
function  can  be  divided  into  two  parts.  The  first  part  is  its 
hashing  part  and  its  second  part  is  its  collision  resolution 
part.  The  first  part  functions  on  the  value  of  the  key  and  based 
on  its  current  value  the  hashing  technique  generates  a memory 
(or  peripheral)  location.  If  the  contents  of  this  location  are 
not  equivalent  to  the  interested  key  value;  then  the  collision 
resolution  part  of  the  hashing  technique  must  resolve  the  conflict. 
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Since  this  problem  deals  with  a data  dictionary  and  a partial  data 
directory,  the  keys  to  be  hashed  are  F/R  names  and  attribute  names. 
These  keys  are  unique,  finite,  md  to  a certain  extent,  do  not 
change  their  names  very  often.  Therefore,  it  is  assumed  that  a 
hashing  technique  could  be  obtained  that  generated  no  collisions. 

This  assumption  allows  the  timing  results  for  the  sequential 
computer  to  be  extremely  fast.  It  also  implies  that  the  algorithm 
must  be  capable  of  accommodating  some  minor  changes  due  to 
additions,  deletions,  and  changes  to  .f/r  and  attribute  names. 
However,  if  a data  base  has  many  changes  in  names,  number  of  F/Rs 
and  attributes,  then  this  assumption  should  be  discarded  and  a 
model  should  be  developed  to  represent  the  additional  time  required 
to  handle  collisions. 

The  timing  equations  for  the  hashing  and  the  transfer  of 
data  Eire  developed  by  writing  the  software  code  in  MIXAL,  Mix's 
assembly  language,  and  then  summing  the  number  of  times  each 
statement  or  operation  is  executed  multiplied  by  the  time  required 
to  perform  its  operation  divided  by  U.  This  provides  a normalized 
time  per  MIXAL  statement,  i.e. 

NTS  = (no.  of  times  an  operation  is  executed) (time  units/ 
operation)  7 U. 


Hie  NTS  for  each  statement  is  sunmed  and  then  multiplied  by  U 

which  is  a unit  of  time  or  relative  measure  (e.g.  ^seconds) 

described  by  Knuth  [24],  such  that 

. . .ADD,  SUB,  all  IDAD  operations,  all  STORE  operations 
(including  STZ),  all  shift  commands  and  all  comparison 
operations  take  two  units  of  time.  MOVE  requires  one 
unit  plus  two  for  each  word  moved.  MUL  requires  ID  and 
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DIV  requires  12  units.  Execution  time  for  floating- 
point operations  is  unspecified.  All  remaining 
operations  take  one  unit  of  time,  plus  the  time  the 
computer  may  idle  on  the  IN,  OUT,  IOC,  or  HLT 
instructions. 

Therefore  the  total  time  for  a MIXAL  code  is 

( ^ NTS  ) U. 

V statements 

HASHING  TECHNIQUE 

The  hashing  technique  chosen  for  the  MIX  computer  is  a 
simple  division  method  where  the  key  value  is  divided  by  a value 
M and  the  remainder  is  used  as  the  location  (or  address)  of  where 
the  key  is  stored.  The  derivation  of  the  total  time  for  keys  of 
one  computer  word  long  will  be  developed  first.  This  will  be 
followed  by  similar  developments  for  keys  whose  sizes  are  two 
computer  words  and  three  computer  words  in  length.  Erom  these 
times  a general  equation  for  key  sizes  > 2 computer  words  will 
be  developed. 

For  the  development  of  the  hashing  timing  equations,  let  M 
be  the  key's  dividing  value,  let  L be  a dummy  location  in  the 
memory,  and  let  S be  the  smallest  possible  hashed  key  value  so 
that  when  subtracted  from  the  remainder  it  normalizes  the 
locations  to  the  zero  location.  The  S variable  emulates  some 
control  over  where  the  key  values  are  stored.  The  other  important 
point  in  understanding  the  following  codes  is  that  stored  in 
the  hashed  location  is  the  address  of  where  the  key  value  is 
stored.  This  allows  the  data  affiliated  with  each  key  value  to 
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be  stored  with  the  key  value  and  therefore  not  interfering  with 
the  hashing  locations  of  other  key  values.  Shown  in  Fig.  VII-5 
is  an  example  of  the  memory  layout  being  discussed.  The  key 
values  to  be  hashed  on  are  stored  in  locations  k^,  kg,  and  k^. 

These  values,  after  being  hashed,  provide  a location  value  between 
location  0 and  RST-1.  The  contents  of  these  locations  contain 
the  location  of  the  key  value  followed  by  its  affiliated  data. 

| 1 

Examples  of  these  locations  are  RST  and  UXZ. 

1 

Keys  of  One  Computer  Word  Long 
— 

Consider  the  following  MIXAL  code  for  hashing  a key  that 
is  one  computer  word  long. 


Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

01 

IDX 

K 

2 

Load  the  searching  key 

value  into  register  X,  rX 

02 

ENTA 

0 

1 

Sets  rA  = 0 

03 

DIV 

aM= 

12 

Divides  rA  and  rX  by  M 

04 

SLAX5 

2 

Shifts  the  remainder  in 

rX  into  rA 

05 

SUB 

=S= 

2 

Subtract  from  the  remainder 

the  lowest  possible  address 

06 

STA 

L 

2 

Stores  the  result  of  line  5 

into  location  L 

07 

LD1 

L 

2 

Load  the  contents  of 

location  L into  rl 

08 

IDA 

0,1 

2 

Loads  the  address  of  the 

hashed  key  value  in  rA 

Figure  VII-5-  Sample  Sequential  Memory  Data  Arrangement 
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NTS  Explanation 

2 Stores  the  hashed  key- 
value's  address  in 
location  L 

2 Loads  the  hashed  Key 
value's  address  in  rl 

2 Load  the  searching  key 
value  in  rA 

2 Compare  the  searching 
key  value  with  hashed 
key's  value 

1 Send  control  to  location 
"success"  if  the  search- 
ing key  value -is  = to  the 
hashed  key’s  value 

In  the  above  code  the  searching  key  is  to  be  stored  in 

memory  address  K.  Parameters  appearing  between  equal  signs, 

e.g.  =S»,  imply  the  use  of  the  actual  value  S.  The  notation  rA, 

rX,  etc.  represent  register  A and  register  X,  respectively.  The 

total  time  for  the  above  code  is 

13 

Total  time  = ( ^ NTS  ) U = 3^U. 

statement=0l 

Keys  of  Two  Computer  Words  Long 

Consider  the  following  MIXAL  code  for  hashing  a key  that 


Line  Loca-  upera- 

No.  tion  tion 

09  STA 


Address 


10 


LD1 


11 


LDA 


12 


COMPA 


0,1 


13 


JE  SUCCESS 


is  two  computer  words  long. 
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Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

• 

Explanation 

01 

LDX 

K1 

2 

Load  2nd  half  of  searching 
key  into  rX 

02 

LDA 

K2 

2 

Load  1st  half  of  searching 
key  into  rA 

03 

DIV 

=M= 

12 

Divide  rA  and  rX  by  M 

Ob 

SLAX5 

2 

Shifts  the  remainder  in  rX 
into  rA 

05 

SUB 

=S= 

2 

Subtract  from  the  remainder 
the  lowest  possible  address 

06 

STA 

L 

2 

Stores  the  result  of  line  5 
into  location  L 

07 

LD1 

L 

2 

Load  the  contents  of  location 

L into  rl 

08 

LDA 

0,1 

2 

Loads  the  address  of  the 
hashed  key  value  into  rA 

09 

STA 

L 

2 

Stores  the  hashed  key  value's 
address  in  location  L 

10 

LD1 

L 

2 

Loads  the  hashed  key  value ' s 
address  in  rl 

11 

LDA 

K2 

2 

Load  1st  half  of  the  search- 
ing key  value  in  rA 

12 

COMPA 

0,1 

2 

Compare  1st  half  of  searching 
key  value  with  1st  half  of 
hashed  key's  value 

13 

JNE 

BEND 

1 

If  they  are  not  equal,  jump 
to  bad  end  (BEND) 

14 

LDA 

K1 

2 

Load  2nd  half  of  searching 
key  value  in  rA 

15 

COMPA 

1,1 

2 

Compare  2nd  half  of  searching 
key  value  with  2nd  half  of 
hashed  key's  value 

16 

JE 

SUCCESS 

1 

Jump,  if  they  are  equal,  to 

SUCCESS 

17 

BEND 

In  the  above  code,  the  "higher"  portion  of  the  searching  key 
is  assumed  to  be  stored  in  memory  location  K2  and  the  "lower" 
portion  is  assumed  to  be  stored  in  memory  location  Kl.  The  concept 
of  "higher"  and  "lower"  portions  represent  the  highest  significant 
bits  and  lowest  significant  bits  if  the  key  was  represented  as  a 
binary  integer.  For  example,  if  the  word  CAT  were  converted  to  a 
binary  integer,  the  C would  have  the  highest  significant  bits  and 
T would  have  the  lowest  significant  bits. 

The  total  time  for  the  above  code  is 
17 

Total  time  ( ^ NTIS  )u  = lou. 

Statement  01 

Keys  of  Three  Computer  Words  Long 

Consider  the  following  MIXAL  code  for  hashing  a key  that  is 
three  computer  words  long. 


I 1 
I 


Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

01 

LDX 

K2 

2 

Load  middle  3rd  of  searching 
key  into  rX 

02 

LDA 

K3 

2 

Load  highest  3rd  of  search- 
ing key  into  rA 

03 

DIV 

=M= 

12 

Divide  rA  and  rX  by  M 

0^ 

SLAX5 

2 

Shifts  the  remainder  in  rX 
into  rA 

05 

LDX 

Kl 

2 

Load  lowest  3rd  of  searching 
key  into  rX 

06 

DIV 

=M=* 

12 

Divide  rA  and  rX  by  M 

i 


07 


SLAX5 


a 


Shifts  the  remainder  in  rX 
into  rA 


173 


Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

08 

SUB 

=S= 

2 

Subtract  from  the  remainder 
the  lowest  possible  address 

09 

STA 

L 

2 

Stores  the  result  of  line  8 
into  location  L 

10 

LD1 

L 

2 

Load  the  contents  of  loca- 
tion L in  rl 

11 

LDA 

0,1 

2 

Loads  the  address  of  the 
hashed  key  value  into  rA 

12 

STA 

L 

2 

Stores  the  hashed  key  value's 
address  in  location  L 

13 

LD1 

L 

2 

Loads  the  hashed  key  values 
address  in  rl 

Ik 

LDA 

K3 

2 

Load  the  highest  3rd  of  the 
searching  key  value  in  rA 

15 

COMPA 

0,1 

2 

Compare  the  highest  3rd  of 
the  searching  key  value  with 
the  highest  3rd  of  the  hashed 
key ' s value 

16 

JNE 

BEND 

1 

If  they  are  not  equal,  jump 
to  bad  end  (BEND) 

17 

LDA 

K2 

2 

Load  middle  3rd  of  the  search- 
ing key  value  in  rA 

18 

COMPA 

1,1 

2 

Compare  the  middle  3rd  of  the 
searching  key  value  with  the 
middle  3rd  of  the  hashed  key's 
value 

19 

JNE 

BEND 

1 

If  they  are  not  equal,  jump 
to  Bad  End  (BEND) 

20 

LDA 

K1 

2 

Load  lowest  3rd  of  the  search- 
ing key  value  in  rA 

21 

COMPA 

2,1 

2 

Compare  lowest  3rd  of  the 
searching  key  value  with  low- 
est 3rd  of  the  hashed  key's 
value 

JE 


22 


BEND 


SUCCESS 


Jump,  if  they  are  equal,  to 
SUCCESS 


r 

i. 


In  the  above  code,  the  higher  portion  of  the  searching  key 
is  assumed  to  be  stored  in  memory  location  K3;  the  middle  3rd  of 
the  searching  key  is  assumed  to  be  stored  in  memory  location  K2j 


and  the  lowest  3rd  of  the  searching  key  is  assumed  to  be  stored  in 
the  memory  location  Kl.  The  total  time  for  the  above  code  is 

22 

Total  Time  ( ^ NTS  )U  = 6lU 
Statement  01 


Key  Sizes  Equal  To  or  Greater  Than  Two 

To  develop  a general  timing  equation  for  keys  > 2 computer 
words  long  requires  a study  of  the  two  MIXAL  codes  for  kej  lengths 
of  two  computer  words  and  three  computer  words.  The  difference 
between  the  two  codes  is  six  statements.  To  convert  the  code  for 
two  computer  words  to  three  computer  words  requires  the  addition 
of  lines  05,  06,  07,  19,  20,  and  21.  Lines  05-07  loads  the  3rd 
portion  of  the  key  word,  divides  by  M,  and  shifts  the  remainder 
into  rA.  Lines  19-21  provide  a jump  to  END  if  the  previous  portion 
of  the  key  was  not  an  exact  compare,  loads  rA  with  the  3rd  portion 
of  the  key  word,  and  compares  this  portion  with  the  stored  key's 
value.  The  total  time  for  these  six  lines  is  21U.  For  any 
additional  key  lengths,  say  four  computer  words  long,  six  more 
lines  with  the  same  operations  must  be  added  to  the  MIXAL  code. 
Intuitively,  the  increase  in  time  for  a key  of  w computer  words 
long,  where  w > 2,  is  (w  - 2)21U  beyond  what  it  takes  in  time  for 
a key  of  two  computer  words  long  (40U). 
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la  this  section  it  has  been  shown  that  the  time  to  hash  on 
a key  one  computer  word  long  is  3lU.  For  keys  that  are  two 
computer  words  long  or  greater  (w  > 2)  the  total  time  is 

[40  + (w  - 2 )21]u. 

This  completes  the  presentation  of  an  implementation  and  its 
timing  equations  for  software  to  find  data  utilizing  a hashing 
function.  The  next  portion  of  this  chapter  deals  with  the 
modeling  of  the  data  following  the  stored  key  values.  These 
are  the  data  that  must  be  transferred  (i.e.  the  second  task) 
from  the  data  dictionary  and  partial  data  directory  area  of  the 
memory  to  the  user's  area. 

DATA.  STRUCTURES 

This  portion  of  the  chapter  deals  with  the  structure  of 
the  data  located  in  memory  affiliated  with  the  stored  key 
values  (see  Fig.  VII-5)*  The  structures  are  divided  into  two 
groups.  Group  1 is  for  type  I synonyms  and  Group  2 is  for  type  II 
synonyms.  Within  each  group  there  are  two  separate  structures, 
one  for  an  f/R  name  and  one  for  an  attribute  name.  These 
structures  will  be  described  using  DP  sets. 

The  structure  of  the  affiliated  data  with  the  key  values  is 
very  important.  It  dictates  for  instance  the  eynount  of  data 
that  must  be  moved  to  the  user's  memory  area.  Most  importantly, 
the  structure  directs  the  DBMS  software  in  finding  needed  data 
such  as  pointers  to  other  data,  the  order  of  occurrences  of  data, 
and  the  amount  of  data  that  is  affiliated  with  a particular  key 


Group  1 


The  structures  in  Group  1 for  an  F/R  name  and  an  attribute 
name  can  be  described  as  two  DP  sets  named  G1FR  and  G1A, 


G1FR  = {F/R  name,  F/R  shortened  name,  F/R  primary  key 


Pt5+(n,-l)3 


(f/rmc)  u p, 


P*-5+n,(3 )+ (P-,+7 ) 


P+5+n1(3)+(n2-l)(P1+7) 


(End  of  Record) 


where  F/RCM  = (F/R  shortened  name,  F/R  pointer,  f/R  primary  key) 
and  AM  = (Attribute  name,  attribute  shortened  name,  size 


descriptor,  representation  technique,  synonym 


descriptor,  uniqueness  descriptor,  password 


name,  privileges  name) 


All  the  variables  in  the  above  DP  sets  are  assumed  to  be  one 


computer  word  long  except  the  F/R  names  and  attribute  names  which 


are  P and  P,  words  long,  respectively.  The  synonym  and  uniqueness 


descriptors  of  an  attribute  only  specify  whether  the  respective 

attribute  has  a synonym  and  whether  the  attribute's  values  are 

unique.  There  are  (n^  + 1)  F/R  pointers.  Each  pointer 
• 

signifies  the  location  of  the  G1FR  of  a related  F/R.  These 
pointers  provide  a circular  linked  list  of  the  data  affiliated 
with  related  F/Rs.  The  F/RCM  DP  sets  provide  the  needed  data 
for  accessing  any  of  an  F/R's  related  F/Rs  without  "following" 


the  linked  list.  The  DBMS  software  can  find  the  desired  F/R 
data  (G1FR)  either  by  using  its  f/r  pointer  or  its  primary  key 
available  in  F/RCM.  The  AM  DP  set  is  self-explanatory.  For  an 
attribute  name  let 


G1A  = {Attribute  name,  attribute  shortened  name,  size 


descriptor,  representation  technique,  coding 


function  to  the  only  stored  synonym,  n. 


1 to  stored  synonym,  pointer  2,  pointer  3 


pointer  n^,  uniqueness  descriptor,  password  name 


{End  of  Record) 


where  the  DP  set,  F/RM  = {f/r  name,  F/R  shortened  name, 

F/R  primary  key,  F/R  pointer) 


The  pointers  one  through  n^  are  the  locations  or  pointers  to  the 


G1A  structures  of  the  synonyms  of  the  attribute  in  question.  The 


first  pointer  in  each  G1A,  as  shown  in  Fig.  VII-6,  point  to  the 


G1A  structure  of  the  attribute  whose  occurrences  have  the  stored 


values 


The  structures  in  Group  2 for  an  F/R  name  and  sin  attribute 
name  csui  be  described  as  DP  sets  named  G2FR  suid  G2A,  respectively 
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= {F/R  name,  f/r  shortened  name,  f/R  primary  key. 


F/R  pointer,  ru ) U (AM)  U P 
^ s * 


Pfl+tT+Pj^) 


P+4+2 (7+p  ) 


(AM)  U ...  UP 


Pf4+(n2-l)(7+P1) 


P+4+(n2)(7+P1) 

U P {End  of  Record), 

s 


The  corresponding  structure  for  an  attribute  name  for  Group  2 


G2A  = {attribute  name,  attribute  shortened  name,  size 
descriptor,  representation  technique,  coding 
function  to  the  first  pointed  to  synonym,  n^, 
pointer  1,  pointer  2,  ...,  pointer  n^,  uniqueness 
descriptor,  password  name,  privileges  name,  n^ ) 

..  J*VP1  ^n6+P1+(P+3) 


9+n^+P..  9+ 

U ps  (f/rm)  U ps 

^ng+P^rylKPt?) 


(F/RM)U  . 


(f/rm) 


9+iv+P  +(nJ(PO) 

U P_  J {End  of  Record), 

s 


The  pointers  one  through  n^  are  the  locations  or  pointers  to 
the  G2A  structures  of  the  synonyms  of  the  attribute  in  question. 
The  first  pointer  in  each  G2A,  as  shown  in  Fig.  VII-7,  form  a 
circular  linked  list  with  each  G2A  data  structure  maintaining 
the  coding  function  to  the  next  synonym  in  the  linked  list. 

i 

These  two  groups  of  structures  describe  a physical  structure 
of  the  data  dictionary  and  partial  data  directory  implemented 
in  the  main  memory  of  a sequential  computer. 
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DATA  TRANSFER 

The  first  portion  of  this  section  presents  a general  MIXAL 
code  for  transferring  data  from  one  portion  of  main  memory  to 
another.  The  data  are  assumed  to  be  either  G1FE,  G1A,  G2FE,  or 
G2A  data  located  in  the  data  dictionary  and  partial  directory. 

It  is  assumed  the  data  are  located  by  hashing  on  either  an  f/r 
name  of  P computer  words  long  or  an  attribute  name  of  P,  computer 
words  long.  Once  the  data  sure  chosen,  the  transfer  code  will 
move  the  data  to  the  user's  work  area.  The  second  portion  of 
^ this  section  presents  a development  of  a general  expression  for 

a timing  equation  for  hashing  and  transferring  data  for  variable 
key  sizes. 

General  Transfer  Code 

The  MIXAL  code  presented  below  will  transfer  n-w  words  of 
data  from  one  portion  of  the  computer's  main  memory  to  smother 
portion  starting  at  an  address  of  BEGIN  + w.  w is  the  size,  in 
computer  words,  of  the  hashing  key  that  located  the  data  to  be 
transferred.  It  is  assumed  that  the  hashing  has  taken  place 

4 

and  was  successful.  Therefore  control  was  sent  to  location 
SUCCESS.  This  action  will  cause  the  following  code  to  be 
executed: 


Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

01 

SUCCESS 

ENT3 

=99= 

1(D 

Sets  rj  to  99 

02 

ENTS 

BEGIN 

1(1) 

Sets  r2  to  the  value 

of  BEGIN 

v • 5* 

4 

A 

J 

»■  * •-  * 

L 


I 


i 


j 
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Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address  NTS 

Explanation 

05 

STA 

w-1,2 

1(2) 

Stores  the  value  of 

rA  into  the  location 

BEGIN  plus  "w-l" 

ol 

INC2 

w 

KD 

Add  the  value  w to  the 

contents  of  r2 

05 

INC1 

w 

KD 

Add  the  value  w to  the 

contents  of  rl 

0 6 

s.o. 

C0MP5 

0,1 

(n-w ) (2  ) 

Compare  the  contents 

of  r5  with  the  contents 

of  rl,  i.e.  searching 

for  End  of  Record 

07 

JE 

F.I. 

(n-w)(l) 

If  they  are  equal  then 
jump  to  F.I.  (finished) 

08 

LDA 

0,1 

[n-(wfl)](2) 

Load  rA  with  the  value 

in  the  address  stored 

in  rl,  i.e.  the  next 

word  in  the  record 

09 

STA 

0,2 

[n-(w+l))(2) 

Store  the  contents  of 

rA  into  the  next  word 

in  the  user ' s work  area 

10 

INC1 

1 

[n-(wfl))(l) 

Add  the  value  1 to  the 

contents  of  rl 

11 

INC2 

1 

[n-(w*l)](l) 

Add  the  value  1 to  the 

contents  of  r2 

12 

JMP 

S.O. 

[n-(wfl)](l) 

Jump  to  location  S.O. 
(start  over) 

15 

F.I. 

LDA 

0,1 

(D(2) 

Load  rA  with  the  contents 

in  the  address  stored  in 

rl,  i.e.  99,  end  of 

record  marker 

ft 

( i 
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SSI  iSall  Addrfss  m zyUMtion 

14  STA  0,2  (l)(2 ) Store  rA  in  the  address 

stored  in  r2,  i.e.  99, 
end  of  record  marker, 
stored  in  user's  work 
area. 

END 

14 

Total  time  = ( £ NTS  ) U » (3  + 10(n-w))  U. 

Statement  01 

In  the  above  code,  99  signifies  an  end  of  record,  BEGIN  is  the 
first  location  in  the  user's  area  in  which  the  transfer  data 
(GUH,  G1A,  G2ra,  or  G2A)  is  to  be  stored  contiguously,  and  "w-1" 
signifies  the  number  of  computer  words  that  should  be  added  to  r2 
to  determine  the  location  to  store  the  lowest  portion  of  the  hashed 
key. 

Time  for  Hashing  and  Transfer 

An  intuitive  derivation  for  a general  expression  of  the  time 
for  hashing  on  a key  and  then  transferring  the  key's  contiguous 
stored  data  to  the  user's  work  area  is  developed  below.  Consider 
the  above  code  for  a key  size  of  one  computer  word,  i.e.  w ■ 1. 

This  implies  a hashing  time  of  34u  and  a transfer  time  of 
t(3  + 10(n-l)]U,  where  n is  the  number  of  words  in  the  transfer 
data  (i.e.  the  size  of  G1FR,  G1A,  G2FR,  or  G2A).  The  transfer 
code  repeatedly  operates  on  (n-l)  words  of  the  data  to  be 
transferred.  The  first  word  of  the  data  is  stored  in  the  user's 
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area  by  the  third  statement  in  the  above  code,  "STA  "w-1",  2". 

Consider  the  case  whenw  = 2.  This  implies  that  the  hashing 
time  is  [4o  + (w-2)21]U,  40U  and  the  transfer  time  is  [3  + 
10(n-2)]U.  However,  the  summation  of  those  two  timing  values 
is  not  the  total  time  for  hashing  and  transfer.  The  problem  is 
that  the  highfest  portion  of  the  key  has  not  been  stored  in  the 
user's  work  area.  To  rectify  this  situation,  a statement  STA  0,2 
needs  to  be  inserted  between  statements  13  and  14  of  the  hashing 
code  for  keys  equal  to  two  computer  words  (w  = 2).  Therefore, 
the  total  time  would  be  4oU  + [3  + I0(n-2)]U  + 2U.  The  same 
procedure  holds  for  the  case  when  w = 3.  The  hashing  time  is 
[40  + (w-2)2l]u  or  6lU  plus  the  transfer  time  [3  + 10(n-2)]u 
plus  4u.  The  additional  4u  is  due  to  storing  the  highest  and 
middle  portions  of  the  hashing  key.  The  changes  to  the  hashing 
code  for  keys  three  words  long  would  occur  between  statements 
16  and  IT  and  statements  19  and  20.  The  statement  to  be  inserted 
between  16  and  17  would  be  STA  0,2  and  inserted  between  19  and  20, 
the  statement  would  be  STA  1,2.  Therefore  a general  expression 
for  the  total  time  to  hash  and  transfer  data  is.* 

I34U  + (3  + lo(n-l) )U  f or  w = 1 

[40  + (w-2 )2l]U  + [3  + 10(n-w)]U  + 2(w-l)U 

for  w > 2 

where  w is  the  key  word  size  (in  computer  words), 

n is  the  number  of  computer  words  to  be  transferred,  and 
U is  a unit  of  time. 


i 
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TIMING  EQUATIONS  FOR  GENERIC  JOBS  1 AND  2 

Given  the  above  MIXAL  codes,  the  derivation  of  the  timing 
equations  for  the  first  two  generic  jobs  can  be  developed.  These 
equations  will  be  developed  first.  Then,  timing  equations  will 
be  developed  for  Jobs  5 and  4 using  the  timing  results  obtained 
for  Job  2 and  some  additional  MIXAL  code. 

I 

Job  1 

Given  the  F/R  name,  the  DBMS  is  required  to  provide  all  of 
its  occurrences.  For  Job  1 performed  on  a sequential  computer 
there  will  be  four  different  timing  equations  based  on  the  follow- 

Hash  on  the  F/R  name  whose  size  is  one  word  long 

(P  = 1)  and  the  DBMS  has  Type  I synonyms  (T10t, ): 

Hash  on  the  F/R  name  whose  size  is  > 2 words  long 

(P  > 2)  and  the  DBMS  has  Type  I synonyms  (T1si2); 

Hash  on  the  F/R  name  whose  size  is  one  word  long 

(P  = 1)  and  the  DBMS  has  Type  n synonyms  (T1sii1)j 
and 

Hash  on  the  F/R  name  whose  size  is  > 2 words  long 
(P  > 2)  and  the  DBMS  has  Type  synonyms  (T._TTO). 

lollc 

Each  timing  equation  represents  the  time  to  hash  on  the  F/R  name 
and  transfer  the  proper  data,  i.e.  either  GlFR.or  G2FR  depending 
on  whether  the  DBMS  has  Type  I or  Type  n synonyms  respectively. 


ing:  • 

1) 

2) 

3) 
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T1SI1 

The  timing  equation  T,  represents  the  time  to  hash  on 
an  F/R  name  one  word  long  and  to  transfer  G1IR  to  the  user's 
area.  Therefore,  using  the  previously  developed  equations  for 
(w  =»  1)  yields  the  following: 

Total  time  = 34u  + (3  + 10(n-l))U. 

Substituting  n = [P  + 6 + 3n1  + n2(P1+7)3  into  the  above 

yields 

T1SI1  = C9T  + 30nl  + Wn2(P1  + 7)3  U. 

T 

A1SI2 

The  timing  equation  Tlgl2  represents  the  time  to  hash  on 
an  F/R  name  > 2 words  long  and  to  transfer  G1FR  to  the  user's 
area.  Using  the  previously  developed  equations  for  w > 2 yields 
the  following: 

Total  time  - [Uo  + (w-2  )2l]U  + [3  + 10(n-w)]U  + 2(w-l)  U. 
Substituting  w = P and  n = (P  + 6 + 3^  + a,(Pj+7))  into 
the  above  yields 

T1SI2  " t59  + 25P  + 30nl  + W(nfe(7+P1))]  U. 


T 

^SIIl 

The  timing  equation  represents  the  time  to  hash  on 

an  F/R  name  one  word  long  and  to  transfer  G2FR  to  the  user's 
area.  Using  the  previously  developed  equations  for  w * 1 yields 
the  following: 

Total  time  , (}k )U  + (3  + lo(n-l))  U. 
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yields 


Substituting  n = [P  + 5 + ( 7+P-j_ ) ] into  the  above 

s 

tisiii  “ [42  + + rj)]  u. 


1SII2 

The  timing  equation  represents  the  time  to  hash  on 

an  F/R  name  > 2 words  long  and  to  transfer  G2FR  to  the  user's 
area.  Using  the  previously  developed  equations  for  w > 2 yields 
the  following: 

Total  time  - [40  + (w-2)2l]U  + [3  + 10(n*w)]U 
+ 2(w-l)  U. 

Substituting  w a P and  Ha  [P  + 5 + r^^+P^)]  into  the 
above  yields 

T1SII2  3 + 23P  + 10(n2)(7  + P^ ) ] U. 


Job  2 

Given  an  F/R  name  and  a number  of  its  attribute  names,  the 
DBMS  is  required  to  provide  a subset  of  the  occurrences  of  the 
F/R.  For  Job  2,  performed  on  a sequential  computer,  there  will 
be  four  different  timing  equations  based  on  the  following: 

1)  Hash  on  the  names  of  the  n*.  attributes  which  are 
one  word  long  (P1  * 1),  and  the  DBMS  has  Type  I 
synonyms  (l^sil^ 

2)  Hash  on  the  names  of  the  n^  attributes  whose  size 
is  > 2 words  long  (P^  > 2)  and  the  DBMS  has  Type  I 
synonyms  (Tgg^)* 

3)  Hash  on  the  names  of  the  n^  attributes  which  are 
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I 


one  word  long  (P^  = 1)  and  the  DBMS  has  Type  II 
synonyms  (l2SII1);  8111(1 

4)  Hash  on  the  names  of  the  n^  attributes  whose  size  is 
> 2 words  long  (P^  > 2)  and  the  DBMS  has  Type  II 
synonyms  (T2sii2). 

To  perform  Job  '2,  when  the  DBMS  has  Type  I synonyms,  requires 
that  Job  1 also  be  performed.  This  is  necessary  because  some  of 
the  n^  attribute's  values  may  not  be  stored  in  the  F/R  in  which 
the  job  was  specified.  Therefore,  the  DBMS  would  require  the 
information  F/RCM  contained  in  G1FR  to  determine  the  F/Rs  that 
are  related  to  the  F/R  specified  in  the  original  job. 

For  Type  I synonyms,  each  timing  equation  represents  the 
time  to  hash  on  n^  attribute  names,  to  perform  n,.  transfers  with 
the  proper  data,  i.e.  G1A  and  to  perform  Job  1 for  the  given  F/R 
name.  For  Type  II  synonyms,  each  timing  equation  represents  the 
time  to  hash  on  n^  attribute  names  and  to  perform  n^  transfers 
with  the  proper  data,  i.e.  G2A. 


^SIl 


The  timing  equation  represents  the  time  to  hash  on 

attribute  names  which  are  one  word  long,  perform  transfers  of 
the  proper  data  (G1A)  and  to  perform  Job  1.  Therefore,  using 
the  previously  developed  equations  for  (w  = 1)  yields  the 
following: 

Total  time  * ^(34 )U  + n^  [3  + 10(n-w)]U  + Tlsn  (or  Tlgl2). 

Substituting  w ■ P^  and  n = [P^  + 10  + n^  + n^(Pf3)]  in  the 
above  yields 


18? 


T2SI1  = + 10  (n6+  )]U  + (or  Tisi2^’ 

^Sl? 

Hie  timing  equation  T^g.^  represents  the  time  to  hash  on 
n^  attribute  names  which  are  > 2 words  long,  perform  n^  transfers 
of  the  proper  data  (G1A)  and  to  perform  Job  1.  Therefore,  using 
the  previously  developed  equations  for  w>  2 yields  the  following: 
Total  time  = n^[4o  + (w-2)2l]u  + i\.[3  + lO(n-w)]u 
+ - 1)  U. 

Substituting  w » P1  and  n = [P^  + 10  + n^  + n^(P+3)]  in  the 
above  yields 

T2SI2  3 “5[99  + 25px  + 10(n6  + ^(PO))]  U 

+ ^T1SI1  ^or  T1SI2^* 

’bsin 

The  timing  equation  ^ represents  the  time  to  hash  on 

n^  attribute  names  which  are  one  word  long  and  perform  i\.  transfers 
of  the proper  data  (G2A).  Therefore,  using  the  previously 
developed  equations  for  (w  =5  1)  yields  the  following: 

Total  time  =»  11^(3^  )U  + ^[3  + 10(n-w)]U. 

Substituting  w m and  n = [P.^  + n^  + n^(Pf3)  + 10  ] 
in  the  above  yields 

^SIU  * “5fl57  + + n5(pf3))3u*  '' 

The  timing  equation  Tg  ^ represents  time  to  hash  on 
attribute  names  which  are  > 2 words  long  and  perform  n^  transfers 
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of  the  proper  data  (G2A).  Therefore,  using  the  previously- 
developed  equations  for  w >_  2 yields  the  following: 

Total  time  = n^[4o  + (w-2)2l]u  + n^[3  + 10(n-w)]u 
+ (n5)  2(w-l)  U. 

Substituting  w = P.^  and  n = [P.^  + n^  + n^(P<-3)  + 10  ] in 
the  above  yields 

T2SII2  = n5["  + 25Pi  + 10(“6  + “jCPO))]  U. 

TIMING  EQUATIONS  FOR  GENERIC  JOB  3 

Job  3:  Given  an  attribute's  occurrence(s)  of  Type  n 
synonyms,  the  DBMS  is  required  to  modify  (e.g.  delete,  change 
value,  etc.)  the  attribute  and  its  synonym's  occurrences  in  all 
their  associated  f/Rs.  The  derivation  of  the  timing  equations  for 
Job  3 is  different  than  that  used  for  Jobs  1 and  2.  Consider 
the  following  two  situations  for  Job  3.  The  first  situation  is 
that,  say  Job  1,  has  been  performed  and  it  is  then  decided  that 
an  attribute  in  the  F/R  in  question  needs  to  be  modified.  The 
procedure  would  be  to  hash  on  the  attribute's  name  and  then 
transfer  its  G2A  data  to  the  user's  area.  Then  the  DBMS  must 
locate  through  the  synonym  pointers  of  the  G2A  all  the  attribute's 
synonym's  G2A  data  and  transfer  those  data  to  the  user's  area. 

The  second  situation  that  can  occur  for  Job  3 is  that  the 
G2A  data,  of  the  attribute  in  question,  is  currently  in  the 
user's  area.  This  could  have  occurred  through  a previous  Job  2 
run.  The  procedure  for  Job  3 would  involve  'locating,  through  the 
synonym  pointers  of  the  G2A,  all  the  attribute's  synonym's  G2A 
data  and  transfer  those  data  to  the  user's  area. 

I 
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^SIIl 

The  timing  equation  TjSII1  is  for  the  first  situation  and 
represents  the  time  to  perform  Job  J on  a sequential  computer 
with  Type  II  synonyms.  It  represents  the  1st  equation  for  Job  3 
and  includes  the  time  to: 

1)  hash  on  an  attribute's  name  one  word  long; 

2)  transfer  its  G2A  to  the  user’s  area;  and 

3)  search  and  transfer  the  attribute's  synonym's 
G2A  data. 

Hie  timing  equation  for  the  first  two  parts  is  simply  ^SII1 
solved  for  n^  = 1,  i.e. 


2SII1 


(3MU  + [3  + 10(n6  + ^(PO)  + H))]  U, 


T2SII1  * [13T  + + n3(Pf3))]  U- 

Once  the  G2A  data  are  in  the  user's  area,  the  third  part  of  TjgII1 
can  be  accomplished  by  the  following  MIXAL  code  which  will 
provide  the  user's  area  with  the  attribute's  synonym's  G2A  data. 
This  code  would  be  executed  directly  following  the  general 
transfer  code  associated  with  T2gm* 


Line  Loca-  Opera- 
No.  tion  tion 


Address 


NTS 


02 


ID5  "BEGHH-Pj+5"  (2)(1) 


Explanation 

Loads  the  contents  of 
r5  with  the  value  in 
location  "BEGIN+P1+5> " 
i.e.  the  number  of 
pointers  (n^) 
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£?*  ST  3£T  43aE£H  as  Explanation 

03  LD6  "BEGIN+Pj+6"  (2  )(l)  Loads  the  contents  of 

r6  with  the  value  in 
location  "BEGIN+Pj+6", 
i.e.  the  first  address 
(or  pointer)  to  a 
synonym's  G2A  data 


04 

ENT4 

"begin+p,+6 

X 

(1)(1) 

Sets  r4  to  the  value 
"BEGIN+P.j+6" 

05 

ENTA 

0 

(1)(D 

Sets  rA  to  zero 

06 

STA 

L 

(2)(D 

Stores  rA  in  address  L 

(i.e.  stores  a zero) 

07  S.O. 

C0MP5 

L 

(2)(n6+l) 

Compare  r5  with  the 

contents  in  L (i.e. 
zero) 

08 

JE 

DONE 

(l)(n6+l) 

If  r5  is  equal  to  zero, 

jump  to  DONE 

09 

INC2 

1 

(D(n6) 

Add  1 to  r2  which 

advances  the  location 

in  the  user's  work  area 

10  N.W. 

C0MP3 

0, 6 

(a ) (n) 

Compare  the  contents  of 

the  location  stored  in 

r6  with  the  contents  of 
r3  (i.e.  99,  end  of 
record  marker) 

11 

JE 

N.S. 

(D(n) 

If  equal,  jump  to  N.S. 
(next  synonym) 

12 

IDA 

0,6 

(2)(n) 

Load  rA  with  contents 

of  r6  (a  word  in  a G2A) 

193 


f 

1 

Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

13 

STA 

0,2 

(2)(n) 

Store  the  contents  of 

rA  into  r2  (stores  a 

word  in  a G2A  record 

into  the  user’s  work 

area) 

14 

INC2 

1 

(D(n) 

Add  1 to  r2 

■ 

15 

INC6 

1 

(D(n) 

Add  1 to  r6 

16 

JMP 

N.W. 

(D(n) 

Jump  to  N.W.  (next 
word) 

17 

N.S. 

LDA 

0,6 

(2)(n6) 

Load  rA  with  the  contents 

of  the  address  in  r6 

(i.e.  99) 

18 

STA 

0,2 

(2)(n6) 

Store  the  contents  of 

rA  (99)  in  the  address 
contained  in  r2  (i.e. 

the  next  word  in  the 

user's  work  area) 

19 

DEC5 

1 

(D(n6) 

Subtract  1 from  r5 

20 

INC4 

1 

( 1 ) (n6 ) 

Add  1 to  r4 

4 

1 

21 

LD6 

0,4 

(2)(n6) 

Loads  r6  with  the  value 

in  the  location  stored 
in  r4  (i.e.  the  address 
of  the  next  G2A  record) 

22 

JMP 

s.o. 

(U(ifc) 

Jump  to  S.O.  (start 
over) 

23 

CONE 

END 

Total  time 

23 

- ( Y.  NTS  )U 

Statement  02 

= 11U  + 

13(ng)U  + ]0(i^)(n)U 

i 


■kk. 


t 


I 
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In  the  above  code  S.O.  represents  Start  Over,  N.S.  represents 
Next  Synonym,  N.W.  represents  Next  Word,  and  "BEGIN+P^+5 " and 
"BEGIN+Pj+6"  are  the  locations  containing  n^  and  the  first 
pointer,  respectively.  The  total  time  for  the  above  code,  by 
replacing  n with  [P1  + n^  + n^(P+3 ) + 10 ] for  G2A,  is 

Total  time  =*  (11  )U  + 13(n6)U  + 100^)^  + ng  + n^PO) 

+ 10]  U. 

Therefore,  summing  the  times  for  the  three  parts  of  T,ott, 
yields : 

T3SII1  = ^2SII1  (with  “5  = -1)  + 1111  + (ng )U 

+ 10n6[P1  + n6  + n5(P*3)]U. 


T3SII2 

The  timing  equation  T^gIl2  is  for  the  first  situation  and 
represents  the  time  to  perform  Job  J on  a sequential  computer 
with  Type  II  synonyms.  It  represents  the  2nd  equation  for  Job  3 
and  includes  the  time  to: 

1)  hash  on  an  attribute's  name  > 2 words  long; 

2)  transfer  its  G2A  data;  and 

3)  search  and  transfer  the  attribute's  synonym's 
G2A  data. 


T2SII2 


T 

3SII5 

The  timing  equation  T,ott,  is  for  the  second  situation  and 
represents  the  time  to  perform  Job  J on  s sequential  computer 
with  Type  II  synonyms.  It  represents  the  ^rd  equation  for  Job  3 
and  includes  the  time  to  search  the  G2A  data  in  the  user's  area 
and  transfer  the  attribute's  synonym’s  G2A  data  in  the  user's 
area.  This  timing  equation  is  obtained  by  realizing  that  to 
accomplish  this  searching  and  transfer  of  data  the  MIXAL  code 
for  situation  one  must  be  executed.  It  is  essentially  equivalent 
to  the  third  part  of  the  timing  equations  for  the  first 
situation.  The  only  difference  is  that  the  above  code  would 
be  prefaced  by  the  following  two  statements,  because  or 

^2SII2  are  not  assumed  to  be  executed  before  the  above  MIXAL 


code. 
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Line 

No. 

Loca- 

tion^ 

■ Opera- 
tion 

Address 

NTS 

Explanation 

00 

ENT2 

Z 

1 

Set  r2  equal  to  Z,  the 

starting  location  where 

the  G2a  data  are  to  be 

stored 

01 

ENT3 

=99=* 

1 

Set  r2  equal  to  99, 
signifying  an  end  of 

record 

01 

Total 

time  = ( 

2^  NTS 

) U = 

2U. 

Statement  00 


Therefore,  summing  the  times  for  the  total  MIXAL  code  yields 
Total  time  = (11 )U  + 13(ng)U  + I0(n6)(n)  + 2U. 

Substituting  n - [Px  + n6  + ^(PO)  + 10 ] in  the  above 

yields 

T3SII3  " + H3(n6)u  + 10(n6)[PL  + ng  + ^(IM-3)]  U. 

TIMING  EQUATIONS  FOR  GENERIC  JOB  h 

Given  an  attribute's  occurrence ( s ) of  Type  I synonyms, 
the  DBMS  is  required  to  modify  (e.g.  delete,  change  value,  etc.) 
the  attribute's  stored  synonym  occurrence(s)  in  its  associated 
F/R.  To  derive  the  timing  equations  for  Job  k requires  an  approach 
similar  to  Job  3.  Consider  the  following  two  situations  for 
Job  4.  The  first  situation  is  that,  say  Job  1,  has  been  perform- 
ed and  it  is  then  decided  that  an  attribute  in  the  F/R  in 
question  needs  to  be  modified.  The  procedure  would  be  to  hash  on 
the  attribute’s  name  and  then  transfer  its  G1A  data  to  the  user's 
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area.  Then  the  DBMS  must  locate  through  the  first  synonym 
pointer  the  G1A  data  for  the  stored  synonym  and  transfer  these 
data  to  the  user's  area. 

The  second  situation  that  can  occur  for  Job  4 is  that  the 
G1A  data,  of  the  attribute  in  question,  is  currently  in  the 
user's  area.  Hiis  could  have  occurred  through  a previous  Job  2 
run.  The  procedure  for  Job  4 would  involve  locating,  through  the 
first  synonym  pointer  of  the  G1A,  the  G1A  data  for  the  stored 
synonym  and  transferring  this  data  to  the  user's  area. 

T4SI1 

The  timing  equation  T^gI1  is  for  the  first  situation  and 
represents  the  time  to  perform  Job  4 on  a sequential  computer 
with  Type  I synonyms.  It  represents  the  1-st  equation  for  Job  4 
and  includes  the  time  to; 

1)  hash  on  an  attribute's  name  one  word  long; 

2)  transfer  its  G1A  data  to  the  user's  area;  and 

3)  search  and  transfer  the  stored  synonym's  G1A  data. 

The  timing  equation  for  the  first  two  parts  is  simply 
solved  for  n^  » 1 and  minus  the  time  to  perform  Job  1,  i.e. 

T2SI1  *[13T  + 10(n6  + n5(Pf5))3  U. 

Once  the  G1A  data  is  in  the  user's  area,  the  third  part  of  T^g^ 
can  be  accomplished  by  the  following  MIXAL  code  which  will  provide 
in  the  user's  area  the  attribute's  stored  synonym  G1A  data. 

This  code  would  be  executed  directly  following  the  general 
transfer  code  associated  with  !L,gI^. 


1 


Line  Loca-  Opera- 
No.  tion  tion 


Address 


"BEGIN+P1+5" 


"BEGIN+P^+6" 


04 

ENTA 

0 

(1) 

05 

STA 

L 

(2) 

0 6 

I 

C0MP5 

L 

(2) 

07 

JE 

N.S. 

(1) 

comp6 


Explanation 


Loads  the  contents  of 
r5  with  the  value  in 
location  "BEGIN+Pj+5 ", 
i.e.  the  address  of 
the  first  word  for  the 
stored  synonym's  data 
(GLA) 

Loads  the  contents  of 
r6  with  the  value  in 
location  "BEGIN+Pj+6", 
i.e.  the  address  of  the 
first  word  for  the  2nd 
synonym's  data  (G1A) 

Sets  rA  to  zero" 

Stores  the  contents  of 
rA  into  location  L, 
i.e.  stores  a zero  in  L 

Compares  the  contents 
of  r5  with  the  contents 
of  L to  check  the  address 
which  should  be  equal 
to  zero 

Jump  to  N.S.  (next 
synonym)  if  they  are 
equal.  This  signifies 
that  it  has  the  stored 
value . 

Compares  the  contents  of 
r6  with  the  contents  of 
L-- to  check  if  the  address 
is  equal  to  zero 


199 


Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

09 

JE 

IIOS 

(1) 

10  INC2 

1 

(1) 

11  N.W.  C0MF3 

0,5 

2(n) 

12 

JE 

DONE 

l(n) 

15 

LDA 

0,5 

2(n) 

14 

STA 

0,2 

2(n) 

15 

INC2 

1 

l(n) 

16 

INC5 

1 

l(n) 

17 

JMP 

N.W. 

l(n) 

N.S. 

END 

IIOS 

END 

DONE 

END 

1 

\ 


Explanation 

Jump  to  IIOS  (it's  it 8 
own  synonym)  if  they  are 
equal — signifying  that 
it  has  no  synonyms 

Add  one  to  the  contents 
of  rl 

Compare  contents  of  r3 
with  the  contents  of  the 
word  whose  address  is 
stored  in  r5  (checks  for 
end  of  record) 

If  they  are  equal,  jump 
to  DONE 

Load  rA  with  the  conteit  s 
of  the  word  whose  address 
is  in  r5 

Store  the  contents  of  rA 
into  the  word  whose  address 
is  in  r2 

Add  one  to  the  contents 
of  r2 

Add  one  to  the  contents 
of  r5 

Jump  to  N.W.  (next  word) 


» 


J 
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17 

Total  time  = ( £ NtS  )U  = (l4)u  + 10(n)  U. 

Statement  02 

The  total  time  for  the  above  code  by  replacing  n with 
[Pi  + ng  + n^(Pt5)  + 10]  in  the  above  yields 

Total  time  = (l*0u  + 10[P1  + n^  + n^(Pt3)  + 10]  U. 
Therefore,  summing  the  times  for  the  three  parts  of  sil  yields 

T4SI1  = (251)U  + 10[P1  + 2n6  + 2n3(p+3 )3  U. 


Ti*SI2 

The  timing  equation  T^g^  is  for  the  first  situation  and 
represents  the  time  to  perform  Job  U on  a sequential  computer 
with  Type  I synonyms.  It  represents  the  2nd  equation  for  Job  b 
and  includes  the  time  to; 

1)  hash  on  an  attribute's  name  > 2 words  long; 

2)  transfer  its  G1A  data  to  the  user's  area;  and 

3)  search  and  transfer  for  the  stored  synonym's  G1A 
data. 

The  timing  equation  for  the  first  two  parts  is  simply  T0 
solved  for  n^  = 1 and  minus  the  time  to  perform  Job  1,  i.e. 

T2SI2  3 [99  + 23Px  + 10(n6  + n5(Pt3))]  U. 

Once  the  G1A  data  is  in  the  user's  area,  then  the  same  MIXAL  code 
applied  to  would  be  executed  to  obtain  the  third  part  of 

T4SI2"  Therefore,  summing  the  times  for  the  3 parts  of  T^g^ 
yields 


» 


% 
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T1*SI2  = (213  + 5P1)U  + 20tn6  + “3(^3)]  U- 


T4SI3 

The  timing  equation  T^g.^  is  for  the  second  situation  and 
represents  the  time  to  perform  Job  4 on  a sequential  computer 
with  Type  I synonyms.  It  represents  the  £-rd  equation  for 
Job  4 and  includes  the  time  to  search  G1A  data  in  the  user's 
area  and  transfer  the  attribute's  stored  synonym's  G1A  data 
to  the  user's  area.  This  timing  equation  is  obtained  by 
realizing  that  to  accomplish  this  searching  and  transfer  of  data 
the  MIXAL  code  for  situation  one  must  be  executed.  It  is 
essentially  equivalent  to  the  third  part  of  the  timing  equations 
for  the  first  situation.  The  only  difference  is  that  the  above 
code  would  be  prefaced  by  the  following  two  statements  because 
T2SI1  °r  T2SI2  811,6  not  assumed  to  be  executed  before  the 


above 

MIXAL 

code. 

Line 

No. 

Loca- 

tion 

Opera- 

tion 

Address 

NTS 

Explanation 

00 

ENT2 

Z-l 

(1) 

Sets  r2  with  a location  one 

less  than  where  the  data 

is  to  begin  being  stored 

01 

ENT3 

=99= 

(1) 

Sets  r3  with  a value  99, 

i.e.  end  of  record  marker 

Total 

time  = ( 

01 

^ NTS 

)u  = 

2u. 

Statement  00 


Therefore,  summing  the  times  for  the  total  MIXAL  code  yields 
Total  time  - (l4)U  + 10(n)U  + 2U. 
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Substituting  n = [P^  + n^  + 

yields 

T4SI3  3 [116  + 10(P1  + 

EVALUATION  RESULTS 
INTRODUCTION 

Figures  VII-8  through  VII-13  represent  the  results  of 
evaluating  the  DDP  by  comparing  its  timing  equations  with  those 
timing  equations  developed  for  a sequential  computer,  with  a unit 
of  time  U chosen  as  one  microsecond.  The  timings  for  the  DDP  were 
based  on  the  STARAN  AP  and  are  shown  as  the  micro- function  timings 
in  the  first  portion  of  this  chapter.  It  is  possible,  by  changing 
these  timings,  to  evaluate  different  combinations  of  conventional 
sequential  computers  with  different  hardware  making  up  the 
components  of  a DDP. 

The  timing  equations  plotted  in  the  following  figures  are 
for  a time  increment  to  perform  four  generic  jobs  submitted  to  a 
DBMS.  The  timing  increment  for  the  DDP  is  from  when  the  sequential 
computer's  CPU  notifies  the  DDP  control  of  a job  to  be  executed 
until  the  DDP  supplies  the  TUMA  with  its  results  (See  Fig.  VI-1). 
The  timing  increment  for  the  sequential  computer  is  that  time  to 
locate  data  in  the  data  diet ionary /directory  and  transfer  it  to 
the  user's  memory  area.  The  data  dictionary/directory  data  are 
assumed  to  reside  in  the  sequential  computer's  main  memory. 

The  curves  in  the  figures  represent  the  maximum  and  minimum 
times  for  the  sequential  computer  and  the  DDP.  The  maximum  and 


(P+3 ) + 10]  in  the  above 
n6  + n3(Pf3)]  U. 


Job  2 


T2SII2  MAX 


2SII1  ‘fill 


;{  2II(U)  MAX 


2II(L)  MIS 


n,.  - number  of  attributes  per  Job 


Figure  VI I -11.  Maximum/Minlmua  Times  for  the  Dictionary /Directory 
Processor  and  a Sequential  Computer  for  Job  2 and 
Type  II  Synonyms. 


Type  II  synonyms 


Equation 


. 
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I 


minimum  times  are  achieved  by  solving  the  equations  for  the 
curves  with  feasibly  high  and  low  values  of  their  parameters. 

The  maximum  feasible  values  of  some  of  these  parameters  were 
chosen  because  of  the  physical  constraints  of  the  STARAN  AP  (i.e. 
P,  P , and  TA).  These  curves  provide  a bounding  in  time  per  job 
of  the  DDP  and  the  sequential  computer  and  a comparison  in  time 
between  the  sequential  computer  and  the  DDP. 

Jobs  1 and  2 are  contained  in  two  figures.  One  figure  is 
for  Type  I synonyms  and  the  other  for  Type  II  synonyms.  Jobs  3 
and  4 are  for  modifying  synonyms  Type  I and  II.  Note  that 
synonyms  exist  in  a data  base  if  two  or  more  attributes  have 
different  names  and  descriptors  or  different  names  for  the  same 
entity  in  the  real  world  and  that  Type  I and  II  synonyms  are 
methods  of  handling  synonyms  in  a computer,  where; 

1)  Type  I.  Store  the  occurrences  of  the  synonyms  in 
only  one  description  and  develop  additional  functions 
to  convert  the  different  occurrences  from  one 
description  to  another. 

2)  Type  II.  Store  the  occurrences  of  the  synonyms 

in  their  different  descriptions,  e.g.  coding,  size, 
etc. 

There  are  three  basic  assumptions  that  have  been  made  in  the 
development  of  these  figures.  They  are; 

1)  all  the  jobs  submitted  to  the  DBMS  are  correct, 
i.e.  they  will  not  be  aborted; 

2)  all  equal  to  searches^  (ETS)  performed  on  parameter 

\ 

values  are  found;  and 


i 


210 

3 ) the  hashing  algorithm  employed  for  the  sequential 
computer  implementation  does  not  consider 
collisions. 

Job  1 

Job  1 requires  the  DBMS  to  provide  all  the  occurrences  of 
an  F/R  when  provided  with  only  the  f/R  name.  The  results  of 
analyzing  the  two  implementations  on  a"  sequential  computer  and 
the  DDP  are  shown  in  Fig.  VII-8  and  VII-9-  The  lower  two  curves 
Of  Fig.  VII-8  (T1I(Ii),  T1I(U))  .nd  Fig.  VII-9  <T1II(1)  and  T1Il(u)) 
show  the  minimum  and  maximum  times  for  the  DDP  implementation. 

The  upper  curves  (TlgI1,  Tlgl2 ) and  (TlgII1,  TlgIl2)  show  the 
minimum  and  maximum  times  for  the  sequential  computer  implementation. 
The  ordinate  is  in  milliseconds  and  the  abscissa  represents 
values  of  n^,  the  number  of  attributes  associated  with  an  F/R. 

The  other  parameters  are: 

1)  P,  the  number  of  computer  words  per  F/R  name; 

2)  P^,  the  number  of  computer  words  per  attribute  name; 

3)  TA,  the  total  number  of  attributes  in  the  data  base; 
and 

^ ) n^,  the  number  of  F/Rs  related  to  a given  f/r  and 

supported  by  the  DBMS. 

Comparing  Figs.  VII-8  and  VII-9  reveals  that  the  type  of 
synonym  only  slightly  affects  the  timings.  Hie  DDP  is  faster 
than  the  sequential  computer  and  as  increases,  the  ratio  of 
the  sequential  computer  time  to  the  DDP  time  also  increases. 
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li 


1 


1 


Job  2 


Job  2 requires  the  DBMS  to  provide  a subset  of  the 
occurrences  of  an  f/r  when  given  the  f/R's  name  and  a number  of 
its  attribute  names.  The  lower  two  curves  in  Fig.  VII- 10 

^^2l(L)*  T2I(u)^  and  Fig*  ^T2II(L)’  ^11(11)^  ShoW  the 

minimum  and  maximum  times  for  the  DDP  implementation.  The  upper 

two  curves  (^SI1,  T2gl2)  and  (T2gII1,  T2sil2^  show  the  minimum 

and  maximum  times  for  the  sequential  computer  implementation.  The 

ordinate  is  in  milliseconds  and  the  abscissa  represents  values  of 

n^,  the  number  of  attributes  specified  in  the  job.  The  other 

parameters  sure: 

1)  P,  the  number  of  computer  words  per  F/R  name; 

2 ) Pp  the  number  of  computer  words  per  attribute  name; 

3 ) TA,  the  total  number  of  attributes  in  the  data  base; 

4)  Oj,  the  number  of  associated  F/Rs  of  a given  attribute; 

5)  n^,  the  number  of  associated  F/Rs  of  a given  number 
(n^)  of  attributes; 

6)  n^,  the  number  of  synonyms  of  a given  attribute;  and 

7)  n^,  the  number  of  F/Rs  related  to  a given  F/R  and 
supported  by  the  DBMS. 

Comparing  Figs.  VII-10  and  VII- 11  reveals  that  Type  I synonyms 
increase  the  upper  bound  for  the  sequential  computer  implementation. 
The  DDP  is  always  faster  than  the  sequential  computer  when  n^  is 
greater  than  six  and  their  lower  bounds  are  approximately  equal 
where  n^  is  equal  to  one.  The  ratio  of  the  sequential  computer 
time  to  the  DDP  time  increases  as  n_  increases. 


Job  5 requires  the  DBMS  to  modify  an  attribute's  occurrence 
in  a data  base  which  has  Type  II  synonyms.  The  DBMS  is  required 
to  modify  all  (f/R)'s  in  which  the  attribute  and  the  attribute's 


synonyms  are  contained.  The  curves  labeled  and 

in  Fig.  VII- 12  are  the  lower  and  upper  boundary  curves  for  the 
DDP.  The  T^gjj2_  T3SII2  are  the  lower  and  upper  boundary  curves 
for  the  Sequential  computer  implementation  assuming  Job  1 had  been 

previously  run.  The  T3SII3(u)  0X6  the  lower  511(1 

upper  boundary  curves  for  the  sequential  computer  implementation 
assuming  Job  2 had  been  previously  run.  The  ordinate  is  in  milli- 
seconds and  the  abscissa  represents  values  of  n^,  the  number  of 
synonyms  of  a given  attribute.  The  other  parameters  are : 

1)  P,  the  number  of  computer  words  per  F/R  name; 

2 ) P^,  the  number  of  computer  words  per  attribute  name; 

3)  ny  the  number  of  associated  F/Rs  of  a given  number 
(n^)  of  attributes;  and 

h)  IA,  the  total  number  of  attributes  in  the  data  base. 

Comparing  the  curves  in  Fig.  VI I- 12  reveals  that  the  lower 
bound  of  the  DDP  is  always  faster  than  a sequential  computer 
implementation  and  the  ratio  of  the  sequential  computer  time  to 
the  DDP  time  increases  as  n^  increases.  The  curves  also  reveal 
that  Job  3 performed  after  Job  2 is  faster  than  if  performed 


after  Job  1 
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Job  4 

Job  4 requires  the  DBMS  to  modify  an  attribute's  occurrence 
in  a data  base  which  has  Type  I synonyms.  The  DBMS  is  required 
to  modify  the  F/R  in  which  the  stored  attribute's  synonym  is 


contained.  The  curves  labeled  611(1  t4i(u)  in  Fig" 


are  the  lower  and  upper  boundary  curves  for  the  DDP.  The  T, 


4SI1 


and  are  the  lower  and  upper  boundary  curves  for  the  sequential 

computer  implementation  assuming  Job  1 had  been  previously  run. 

““  TlfSI3(L)  *“d  T^SI3(u)  8116  the  lower  611(1  uPPer  boundary  curves 
for  the  sequential  computer  implementation  assuming  Job  2 had 
been  previously  run.  The  ordinate  is  in  milliseconds  and  the 
abscissa  represents  values  of  n^,  the  number  of  associated  f/Rs 
of  a given  attribute.  The  other  parameters  are; 

1)  P,  the  number  of  computer  words  per  f/R  name; 

2)  P^,  the  number  of  computer  words  per  attribute  name; 

3)  ng,  the  number  of  synonyms  of  a given  attribute; 


1) 

p. 

2) 

pr 

3) 

n6' 

and 

M 

TA, 

4)  TA,  the  total  number  of  attributes  in  the  data  base. 

Comparing  the  curves  in  Fig.  VII- 13  reveals  that  the 
lower  bound  of  the  DDP  is  always  faster  them  a sequential  computer 
implementation  and  the  ratio  of  the  sequential  computer  time  to 
the  DDP  time  increases  as  n^  increases.  The  curves  also  reveal 


that  Job  V performed  after  Job  2 is  in  general  faster  than  if 
performed  after  Job  1. 


» ••  •»  -*7.*  . • r • 
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SUMMARY 

This  concludes  the  results  of  evaluating  the  DDP.  This 
chapter  has  provided  a description  of  the  development  and 
utilization  of  timing  equations.  These  equations  modeled  both 
the  DDP  and  a sequential  computer's  implementation  of  the 
functions  performed  by  the  DDP.  The  results  of  evaluating  the 
DDP  are  presented;  in  general,  the  DDP  is  faster  than  the 
sequential  computer  implementation.  The  primary  contribution 
provided  in  this  chapter  was  the  mathematical  equations  developed 
such  that  any  conventional  sequential  computer  and  different  DDP 
hardware  components  may  be  evaluated. 

I 
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Chapter  VIII 

RESULTS,  CONCLUSIONS,  AND  FUTURE  RESEARCH 
INTRODUCTION 

The  first  three  chapters  contain  an  introduction,  a review 
of  the  pertinent  literature,  the  problem  definition,  and  a 
methodology  for  approaching  the  problem.  Chapters  IV  through  VII 
contain  a mathematical  base  for  data  base  management  which  includes 
four  levels  of  data,  a proposed  hardware  design,  and  an  evaluation 
of  the  design.  The  remainder  of  this  chapter  concentrates  on  the 
contents  of  these  four  chapters. 

Chapter  IV  contained  a description  of  a mathematical  base  for 
data  base  management.  This  base  was  composed  of  set  theory  and 
an  extension  of  set  theory  called  Data  Processing  (DP)  sets.  The 
relationships  between  sets  and  DP  sets  were  also  provided.  The 
most  interesting  level  of  the  mathematical  base  presented  dealt  with 
characteristic  functions,  characteristic  sets,  and  the  properties 
of  each.  This  level  provided  a convenient  method  of  implement- 
ing this  mathematical  base  in  computer  hardware. 

A language  capability  for  modeling  the  levels  and  functions 
of  Data  Base  Management  is  provided  by  sets  and  DP  sets.  Chapter  V 
contains  a presentation  of  this  modeling  pertaining  to  the  follow- 
ing four  levels  of  data: 

1)  the  user  computer  interface  (Reserved  Word); 

2)  the  attribute  and  file  or  relationship  (F/R) 
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names  (Data  Name); 

3)  the  modifiers  of  the  attribute  and  f/r  names  (Data 
Descriptors);  and 

4)  the  occurrences  of  the  attributes  and  f/Rs  (Data 
Occurrence ) . 

DBM  functions  are  not  divided  into  any  specific  levels.  They  are 
dynamic  and  operate  on  the  above  four  levels  of  data  to  perform 
the  jobs  requested  by  the  user  of  a DBM  System  (DBMS). 

The  sixth  chapter  contained  a comparison  of  those  levels  of 
DBM  described  in  Chapter  V with  special  hardware  already  develop- 
ed and  described  in  the  literature.  It  was  determined  that  the 
Data  Dictionary  and  part  of  the  Data  Directory  (or  parts  of  the 
Data  Name  and  Data  Descriptor  levels)  had  not  previously  been 
totally  addressed  by  the  hardware  community.  The  rest  of  the 
chapter  describes  a hardware  implementation  of  a data  dictionary 
and  partial  Directory  of  data  Dictionary /Directory  Processor 
(DDP).  The  description  is  presented  in  four  steps.  The  first 
step  interfaced  the  hardware  with  a sequential  computer.  Next, 
a design  of  the  hardware  at  a more  detailed  level  considering 
its  control,  data  storage,  and  data  transfer  was  given.  The 
hardware  at  the  logic  gate  and  flip  flop  levels  was  provided  in 
the  third  step.  The  final  step  demonstrated  how  the  DDP  would 
perform  its  DBM  functions. 

The  seventh  chapter  contained  an  evaluation  of  the  hardware 
design  of  the  DDP.  The  mathematical  equations  developed  for 
this  evaluation  can  be  used  to  evaluate  any  conventional  sequential 
computer  as  well  as  various  DDP  hardware  components.  The 
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evaluation  was  provided  through  five  steps  which  concluded  by 
comparing  the  times  required  by  the  DDP  and  a sequential  computer 
to  perform  four  basic,  yet  all  encompassing  jobs  that  could  be 
submitted  to  a DBMS.  This  chapter  begins  by  providing  some  basic 
results  and  conclusions  that  can  be  drawn  from  Chapters  IV  through 
VII.  This  is  followed  by  a discussion  of  three  areas  where  future 
research  should  be  performed. 

RESULTS  AND  CONCLUSIONS 

To  complete  the  documentation  of  the  results  obtained  from 
this  research,  two  areas  need  to  be  addressed.  The  first  is 
concerned  with  tying  together  the  work  described  in  Chapters  IV 
and  V with  that  described  in  Chapters  VI  and  VII.  The  second 
is  concerned  with  providing  some  numerical  values  resultant 
from  the  evaluation  contained  in  Chapter  VII. 

There  are  important  relationships  tying  together  the  contents 
of  Chapters  IV  and  V with  that  of  VI  and  VII.  In  a previous 
chapter  it  was  shown  that  the  mathematical  base  developed  (see 
Chapter  TV)  can  be  used  (see  Chapter  V)  to  model  DBM  from  that 
level  seen  and  utilised  by  a user  down  to  the  bit  level  of  a 
digital  computer.  Chapters  VI  and  VII  provided  a description  and 
an  evaluation  of  a proposed  hardware  implementation  of  some  of 
those  DBM  functions  modeled  in  Chapter  V.  These  functions 
operated  on  those  levels  of  data  pertaining  to  the  data  dictionary 
and  a partial  data  directory  (DDP).  This  hardware  implementation 
of  a DDP  was  developed  by  implementing  portions  of  the  mathematical 
base  described  in  Chapter  IV.  This  can  be  seen  by  comparing  the 
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properties  of  Data  Processing  Characteristic  sets  (DPCSs)  to 
those  functions  discussed  in  Chapters  VI  and  VII  and  referred  to 
in  Table  VII-1.  To  be  specific,  a single  equal  to  £earch  in  an 
AM  (see  Table  VII-1,  Function  I,  Macro- function  SETS)  is  an 
implementation  in  hardware  of  property  XXIII  for  DPCSs.  The 
macro- function  SYNO  and  RELA  of  functions  I and  IV  respectively 
in  Table  VII-1,  is  an  implementation  of  the  DPCS  property  XXIV. 
Finally,  the  ATFR  macro- function  of  function  IV  in  Table  VII-1  is 
an  implementation  of  DPCS  XXI.  Therefore,  sets  and  DP  sets  not 
only  provide  a language  for  modeling  DBM  from  the  user's  level 
down  to  the  bit  level  but  also  provides  a convenient  method  for 
implementing  some  of  the  DBM  functions  in  hardware. 

The  DDP  was  evaluated  (see  Chapter  VII ) by  comparing  the 
time  for  the  DDP  and  a sequential  computer  to  execute  the  same 
jobs.  These  jobs  were  created  to  encompass  the  most  basic  or 
generic  jobs  a user  would  submit  to  a DBM  system.  The  jobs  are 
defined  below,  followed  by  the  definitions  of  those  parameters 
that  influence  the  times  for  the  DDP  and  the  sequential  computer 
to  perform  these  generic  jobs.  Finally,  Tables  VIII- 1 through 
VIII-3  will  be  discussed.  They  contain  the  numerical  values  of 
the  minimum  and  maximum  times  for  the  DDP  and  the  sequential 
computer  to  perform  the  generic  jobs  for  a range  of  values  of 
the  above-mentioned  parameters. 

The  four  generic  jobs  that  are  created  are: 

1)  Job  1 - to  provide  all  the  occurrences  of  a File/ 
Relationship  (f/r)  given  only  the  F/R  name; 
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2)  Job  2 - to  provide  a subset  of  the  occurrences  of 

an  F/R  given  the  F/R  name  and  some  operation 
on  a number  (n^)  of  its  attributes; 

3 ) Job  3 - to  modify  all  the  occurrences  of  an  attribute 

in  a data  base  with  Type  II  synonyms;*  and 

4 ) Job  4 - to  modify  all  the  occurrences  of  an  attribute 

in  a data  base  with  Type  I synonyms.* 

The  parameters  that  influence  the  times  to  perform  the  above 
generic  jobs  are; 

1)  P,  the  number  of  computer  words  per  f/R  name; 

2)  Pp  the  number  of  computer  words  per  attribute  name; 

3 ) np  the  number  of  F/Rs  related  to  a given  f/R  and 

supported  by  the  DPMS; 

4)  rig,  the  number  of  attributes  in  a given  f/R; 

5)  jOj,  the  number  of  associated  f/Rs  of  a given  attribute; 

6)  n^,  the  number  of  associated  F/Rs  of  a given  number 

(n^)  of  attributes; 

7)  n,_,  the  number  of  attributes  specified  in  Job  2; 

8)  n^,  the  number  of  synonyms  of  a given  attribute;  and 

9)  TA,  the  total  number  of  attributes  in  the  data  base. 

Table  VIII- 1 contains  the  minimum  and  maximum  times  in 

milliseconds  for  Job  1.  The  three  double  columns  of  data  are 
# ~ 

Note;  Type  I and  II  synonyms  are  methods  of  handling  synonyms 
in  a computer  where: 

1)  Type  I - Hie  occurrences  of  the  synonyms  are  stored  in  only 
one  description  and  additional  functions  are  developed  to 
convert  the  different  occurrences  from  one  description  to 
another,-  and 

2 ) Type  II  - The  occurrences  of  the  synonyms  are  stores  in  . 
their  different  descriptions. 
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the  times  for  an  F/R  which  has  10,  20,  and  4o  (iv, ) attributes. 

The  first  two  rows  of  data  provide  the  times  for  the  sequential 
computer  with  Type  I and  Type  II  synonyms,  respectively.  The 
following  two  rows  of  data  provide  the  times  for  the  DDP  with 
Type  I and  Type  II  synonyms,  respectively.  The  parameters  and 
their  values  for  the  minimum  columns  of  data  are  P = P^  = n^  = 1 
and  TA  = 256.  The  parameters  and  their  values  for  the  maximum 
columns  of  data  are  P = P^  = n^  = 8 and  TA  = 1024. 

Table  VIII-2  contains  the  minimum  and  maximum  times  in 
milliseconds  for  Job  2.  The  three  double  columns  of  data  are 
the  times  for  an  F/R  which  has  10,  20,  and  4o  (1^)  attributes. 

The  first  four  rows  of  data  are  for  n = 4,  the  next  four  rows 

5 

of  data  are  for  n^  = 8,  and  the  last  four  rows  of  data  are  for 
n^  = 16.  The  first  two  rows  of  data  in  each  group  of  four  provide 
the  times  for  the  sequential  computer  with  Type  I and  Type  II 
synonyms,  respectively.  The  second  two  rows  of  data  in  each 
group  of  four  provide  the  times  for  the  DDP  with  Type  I and 
Type  II  synonyms,  respectively.  The  parameters  and  their  values 
for  the  minimum  columns  of  data  are  P = P^  = n^  = n^  = n^  = 1, 

TA  * 256,  and  n^  = 0.  The  parameters  and  their  values  for  the 
maximum  columns  of  data  are  P = P^  = n^  a n^  * = 8,  n^  = 10, 

and  TA  » 1024.  The  first  double  column  of  data  for  the  last 
four  rows  are  blank  because  n^  is  greater  than  rig  which  is  an 
infeasible  situation. 

Table  VIII-3  contains  the  minimum  and  maximum  times  in  milli- 
seconds for  Jobs  3 and  4.  The  two  columns  of  data  are  the  times 
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Job  1 


Seq  ( 


DDP  ( 


n2 

= 10 

n2 

= 20 

”2 

= 4o 

min 

max 

min 

max 

min 

max 

Type  I 

0.9 

2.0 

1.7 

3-5 

3-3 

6-5 

Type  II 

0.8 

1.7 

1.6 

3-2 

3-2 

6.2 

Type  I 

0.1 

0.4 

0-3 

0.7 

0.6 

1.2 

Type  II 

0.2 

0.4 

0.3 

0.6 

0.6 

1.1 

(Time  in 

milliseconds ) 

Table  VIII-1.  Job  1 Resultant  Times  for  the  Sequential  (Seq) 
Computer  ard  the  Dictionary/Directory  Processor 
(DDP). 
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16 


Job  2 


°2 

= 10 

°2 

« 20 

°2 

m 40 

min 

max 

min 

max 

min 

max 

Seq 

{Type 

I 

1.6 

7.8 

2.4 

9-3 

4.0 

12.0 

Type 

II 

0.7 

5-9 

0.7 

5-9 

0.7 

5-9 

DDP 

{Type 

I 

0.2 

0.8 

0.2 

0.8 

0.2 

0.8 

Type 

II 

0.2 

0.8 

0.2 

0.8 

0.2 

0.8 

Seq 

Type 

*Type 

I 

2.3 

13-7 

3-1 

15.2 

4.7 

17.8 

II 

1.4 

11.7 

1.4 

11-7  ‘ 

1.4 

11.7 

DDP 

Type 

(Type 

I 

0.3 

1.2 

0-3 

1.2 

0.3 

1.2 

II 

0.2 

1.0 

0.2 

1.0 

0.2 

1.0 

Seq 

{Type 

I 

4.5 

26.9 

6 1 

29.5 

Type 

II 

2.8 

23.5 

2.8 

23.5 

DDP 

Type 

{Type 

I 

0-5 

1.8 

0.5 

1.8 

II 

0.4 

1.8 

0.4 

1.8 

(Time  in  milliseconds) 


Table  V1II-2.  Job  2 Resultant  Times  for  the  Sequential  (beq) 
Computer  and  the  Dictionary/Direct.ory  Processor 
(DDP). 
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Jobs  3 and  4 


“3 


°3 


= 10 


n6 


min 

max 

Seq 

j Job  3 

0.2 

2.6 

Job  4 

0.2 

2.7 

DDP 

Job  3 
* Job  k 

0.1 

0-3 

0.1 

0.3 

Job  3 

0.8 

6.8 

Seq 

* Job  k 

0.2 

2.8 

Job  3 

0.3 

1.2 

DDP 

* Job  k 

0.1 

0.3 

Job  3 

2.0 

12.5 

Seq 

{ Job  h 

0.2 

2.8 

DDP 

Job  3 

0.7 

2.3 

Job  4 

0.1 

0.3 

(Time  in  milliseconds) 


Table  VIII-3.  Jobs  3 and  ^ Resultant  Times  for  the  Sequential 
(Seq)  Computer  and  the  Dictionary/Directory 
Processor  (DDP). 
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for  modifying  an  attribute  which  has  1 and  10  (n^)  associated  F/R(s) 
The  first  four  rows  of  data  are  for  n^  = 1,  the  next  four  rows  of 
data  are  for  n^  = 4 and  the  last  four  rows  of  data  are  for  n^  = 8. 
The  first  two  rows  of  data  in  each  group  of  four  provide  the  times 
for  the  sequential  computer  for  Job  3 and  Job  4,  respectively.  The 
second  two  roys  of  data  in  each  group  of  four  provide  the  times  for 
the  DDP  for  Job  3 and  Job  4 respectively.  The  parameters  and  their 
values  for  the  minimum  column  of  data  are  P = = 1 and  TA  = 256. 

The  parameters  and  their  values  for  the  maximum  column  of  data  are 
P = Px  = 8 and  TA  = 1024. 

A close  evaluation  of  the  above  three  tables  yields  the 
following  results.  The  minimum  ratio  of  times  for  the  sequential 
computer  to  the  DDP  is  2 and  occurs  for  Jobs  3 and  4 for  n^  = 1. 

The  maximum  ratio  of  times  for  the  sequential  computer  to  the  DDP 
is  20  and  occurs  for  Job  2 for  Type  I synonyms  with  = 4o  and 
n^  = 4.  This  range,  2 to  20,  for  the  ratio  in  time  between  the 
sequential  computer  and  the  DDP  provides  one  quantitative  resultant 
measure  for  the  DDP. 

More  generic  results  pertaining  to  the  DDP  and  a sequential 
computer  are  obtainable  by  summarizing,  by  jobs,  the  conclusions 
that  can  be  drawn  from  Figs.  VII-8  through  VII- 13 . For  Job  1, 
the  type  of  synonyms  in  the  data  base  (Type  I and  Type  II)  only 
slightly  affect  the  timings  and  the  DDP  is  faster  than  the 
sequential  computer.  As  the  number  of  attributes  per  F/R 
increases,  the  ratio  of  the  sequential  computer  to  the  DDP  time 


also  increases.  For  Job  2,  the  DDP  is  always  faster  than  the 


sequential  computer  when  the  number  of  attributes  per  job 
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is  greater  than  six.  Their  lower  bounds  are  approximately  equal 
when  n^  is  equal  to  one.  The  ratio  of  the  sequential  computer 
time  to  the  DDP  time  increases  as  n^  increases.  For  Job  3,  the 
lower  bound  timing  curve  for  the  DDP  indicates  that  the  DDP  is 
faster  than  a sequential  computer  implementation.  The  ratio  of 
the  sequential  computer  trime  to  the  DDP  time  increases  as  the 
number  of  synonyms  per  attribute  increases.  Job  3 performed 
after  Job  2 is  faster  than  if  performed  after  Job  1.  For  Job  4, 
the  lower  bound  timing  curve  for  the  DDP  indicates  that  the  DDP 
is  faster  than  a sequential  computer  implementation.  The  ratio 
of  the  sequential  computer  time  to  the  DDP  time  increases  as  the 
number  of  associated  F/Rs  of  an  attribute  increase.  Job  4 
performed  after  Job  2 is  in  general  faster  than  if  performed 
after  Job  1. 

FUTURE  RESEARCH 

The  mathematical  base  developed  herein,  sets  and  DP  sets, 
should  be  pursued  further.  It  is  felt  that  this  mathematical 
base  should  be  investigated  to  determine  how  well  it  can  model 
and  therefore  communicate  non- numerical  data  processing  functions. 
It  appears  reasonable  to  conclude  that  this  base  is  appropriate 
for  describing  parsing  and  syntactical  functions,  which  appear 
in  many  large  software  programs.  The  Push  and  Pull  functions 

i 

defined  for  DP  sets  provide  an  obvious  method  for  modeling 
push-down  and  pop-up  stacks  plus  those  functions  that  operate  on 
these  stacks.  This  mathematical  base  may  provide  a complete 
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language  that  can  be  used  for  all  levels  of  communication  and  data 
processing  from  software  design  and  evaluation  to  implementation. 

A language  with  these  capabilities  is  needed  to  assist  in  the 
creation  of  more  correct  and  reliable  software. 

An  extension  to  the  above  research  area  involves  designing 
computer  architectures  for  some  of  the  non-numerical  data  levels 
and  their  respective  functions.  This  research  would  be  similar 
to  the  design  and  evaluation  of  the  DDP.  Hie  effort  could  consider 
those  problems  existing  in  the  Reserved  Word  level  of  data.  As 
an  example,  it  seems  feasible  that  a computer  architecture  for 
those  data  involved  with  parsing  and  syntactical  functions  would 
be  beneficial  for  DBM  and  for  large  software  programs  in  general. 

Other  areas  of  data  processing  should  also  be  considered.  For 
instance,  Landson  and  Sargent  [26]  have  designed  and  evaluated 
different  architectures  for  priority  queues.  Two  of  the 
priority  queues  considered  were  First  In  First  Out  (FIFO)  and  Last 
In  First  Out  (LIFO).  Hiese  priority  queues  are  similar  to 

implementations  and  functions  performed  on  pop-up  and  push-down  / 

l 

stacks.  This  area  of  research  should  be  pursued  in  order  to 
increase  a computer's  operating  efficiency  and  possibly  reduce  some 
of  the  costs  of  computing. 

The  final  and  most  important  area  related  to  DBM  is  the 
design  of  an  architecture  for  a "fully"  relational  DBMS.  The 
term  "fully"  relational  signifies  that  a user  may  specify  his/her 
query  by  only  stating  the  attribute  names  and  values  of  interest. 

Or,  stated  differently,  the  user  does  not  have  to  know  which  File 


t ’yjjmii  hm ....  - 
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or  Relationship  (F/R)  each  attribute  in  the  query  is  related  to. 
The  DBMS  should  determine  if  the  query  can  be  fulfilled  before 
processing.  All  the  basic  architecture  is  logically  contained 
in  the  proposed  DDP.  Required  is  an  efficient  algorithm  that  can 
locate  the  one  or  more  F/Rs  that  can  fulfill  the  query  by  using 
A>£,  ARRAY  III,  AM5,  and  AM^.  The  algorithm  must  first  determine 
that  all  the  attributes  are  related  and  then  create  the  most 
efficient  way  of  relating  them  through  their  respective  F/Rs. 
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Appendix  A 


DP  sets  can  be  used  to  model  data  structures  such  as 
Relational,  Tree,  Network,  and  the  Entity  Set  Model.  Consider 
the  following  relation  S where  the  domains  are  S SNAME,  STATUS, 
and  CITY  (See  Fig*  V**l).  The  Relation  can  be  modeled  as 


S = {AL  U Orf5}  U P^A,)  U {)rf9}  U F9^) 


u tf16}  u psi6(a4)) 


where 


\ = (s,#j, 

Ag  = (S,N,A,M,E), 

Aj  ■ (S,  T,A,  T,U,  S},  and 

A^  = (C, I, T, Y)  are  Normal  DP  sets. 

The  same  procedure  can  be  used  to  model  an  occurrence  of  S by 
replacing  A ^ by  its  respective  j-th  occurrence  (A.  ). 

A tree  structure  can  be  depicted  by  a diagram  similar  to 
the  figure  shown  in  Fig.  A-l.  If  we  assume  that  each  node 
contains  some  finite  number  of  attributes,  then  a high  level 
DP  set  model  of  this  structure  can  be  constructed  using  sets 
and  DP  sets  as  follows; 

( (B,  C,  Pg,  D,  Pj,  /,  Ap  Ag,  Afi), 

(E,  P^,  /,  Bp  B2,  ...,  Bk), 


* 


(/,  Cp  C2,  Cm], 

(/»  ^2’  ’ ' " > (/>  ^2*  • • • > Ep } ) 

where  a slash  (/)  is  used  as  a delimiter,  P^  is  a DP  set  model 
of  a pointer  to  the  node  previous  to  itself,  and  the  other  alpha 
characters  are  the  DP  set  models  of  the  node's  and  attribute's 
names.  The  alpha  characters  that  are  right  of  the  slash  are  the 
DP  set  models  for  attribute  names.  Those  to  the  left  of  the 
slash  are  the  DP  set  models  for  node  names. 

A network  is  modeled  in  a similar  way.  Consider  a model  of 
a network  shown  in  Fig.  A-2  where  each  node  contains  some  finite 
number  of  attributes.  A very  high  level  model  of  this  network 
can  be  constructed  as: 

({Bf  Pp  C,  Pg,  /,  Ap  A,,  A^,  /,  B,  Pj ),  (A,  P^,  C,  P^,  /, 

Bl>  B2’  Bk>  f*  A»  Pl^  */»  Cl>  C2>  Cm’  f* 


A,  P2,  B,  P4JJ, 


where  the  addition  to  the  tree  DP  set  model  is  the  names  and 
pointers  following  the  second  delimiter  which  points  to  the  node 
below  itself. 

The  Entity  set  model's  basic  building  block,  i.e.  Entity 
Name  Set  Name/Role  Name/Entity  Name  or  its  basic  triplet  can  be 
modeled  as  a DP  set.  Consider  the  triplet  modeled  as  a DP  set 
wnere  the  first  element  is  the  Entity  Name  Set  Name  (E.N.S.Ni), 
the  second  element  is  the  Role  Name  (R.N.),  and  the  third  element 
's  an  actual  occurrence  of  the  data  described  by  the  first  tyo 


* 7-  »'■  ' 
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elements.  The  following  is  the  format  of  the  basic  triplet 
modeled  by  a Normal  DP  set; 

{{E,N,S,N},  (R,N),  {0,C,C,U,R,R,E,N,C,E}). 

An  actual  occurrence  of  a triplet  might  be; 

{ {P, A,R, T),  (P,A,R,T,)l(,S,U,P,P,L,I,E,D),  (P,A,R,Tj). 
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Appendix  B 

The  two  major  techniques  investigated  for  performing  an 
equal  to  search  (ETS)  of  a data  dictionary  and  a data  directory 
were  hashing  and  explicit  binary  search  (EBS).  The  hashing  and 
hardware  equations  were  developed  in  Chapter  VII.  This  Appendix 
presents  the  equations  for  an  EBS  technique  and  the  comparison  of 
these  equations  with  the  hardware  and  hashing.  The  derivation  of 
the  equations  for  the  EBS  technique  is  presented  first  with  a 
comparison  of  results  with  the  other  techniques.  Then  the  variance 
in  time  associated  with  the  EBS  is  developed  followed  by  a trade- 
off analysis  of  the  hashing  and  EBS  techniques. 

For  the  EBS  technique,  a modified  version  of  algorithm  T, 
by  Knuth  [25],  was  utilized.  The  N items  are  assumed  to  be 
stored  and  maintained  as  a full  binary  tree  and  the  other 
assumption  made  is  that  each  item  has  equal  probability  of  being 
searched.  An  item  is  stored  in  a "node"  in  the  tree  which  is 
made  up  of  two  computer  words.  The  first  word  contains  the  left 
link  (t.t.tnk),  which  points  to  the  address  of  the  first  word  in 
the  "node"  for  the  next  smallest  item,  and  the  right  link  (RLINK), 
which  points  to  the  address  of  the  first  word  in  the  "node"  for 
the  next  larger  item.  The  second  word  in  a "node"  contains  the 
item.  The  last  assumption  made  is  that  every  item  searched  is 
within  the  binary  tree  and  will  be  found. 


The  required  (average)  time  for  the  MIX  computer  is 


determined  by  the  following  set  of  steps  and  their  respective 


number  of  times  of  execution 


Statement  (MIXAL 


Line  Location  Operation  Address 


Explanation 


LLINK  set  = to  the 
2nd  and  3rd  Bytes 


01  LLINK 


02  RLINK 


RLINK  set  = to  the 
4th  and  5th  Bytes 


1(2)  Load  the  item  value 
to  be  searched  into 
register  A 


1(2)  Load  register  2 with 
the  address  of  the 
root  node 


ROOT 


0,2(RLINK)  C2(2 ) Load  register  2 with 

the  contents  of  the 
4th  and  5th  Bytes  of 
the  word  addressed 
in  the  current 
contents  of  register 


C(2)  Compare  register  A 

with  the  word  located 
in  the  address  * (1+ 
contents  of  register  2) 


COMPA 


Line  Location  Operation 


Address 


SUCCESS 


NTS  Explanation 

C(l)  Jump  to  line  (6)  if 
the  comparison  in 
line  08  is  "greater" 

Cl(l)  Jump  to  line  (?)  or 
a statement  labeled 
"success"  meaning 
the  proper  node  was 
found 


0,2(LLINK)(C1-S)(2)  Load  register  2 with 

the  contents  of  the 
2nd  and  3rd  Bytes  of 
the  word  addressed 
in  the  current 
contents  of  regis- 
ter 2 

2F  (Cl-S)(l)  Jump  to  line  08  if 
register  2 is  £ 0. 


The  time  for  the  execution  of  this  algorithm  is  calculated  by 

summing  up  the  number  of  times  each  statement  is  executed 

multiplied  by  U,  where  U is  a unit  of  time  or  a relative  measure; 

described  by  Knuth  [24]  such  that 

. . .ADD,  SUB,  All  Load  operations,  all  STORE 
operations  (including  STZ),  all  shift  commands 
and  all  comparison  operations  take  two  units  of 
time.  MOVE  requires  one  unit  plus  two  for  each 
word  moved.  MUL  requires  10  and  DIV  requires  12 
units.  Execution  time  for  floating-point  operations 
is  unspecified.  All  remaining  operations  take  one 
unit  of  time,  plus  the  time  the  computer  may  idle 
on  the  IN,  OUT,  IOC,  or  HLT  instructions. 

Given  this  definition  of  U and  the  following  parameter  definitions, 

a function  for  an  average  response  time  can  be  developed.  The 
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I 1 

parameters  are: 

1)  C,  average  number  of  comparisons  made; 

2)  Cl,  average  number  of  times  the  searching  item 
value  is  < "node'"s  item  value; 

3)  S.  S = 1 if  the  search  is  successful,  0 otherwise; 
and 

4)  C2,  average  number  of  times  the  searching  item 

value  > "node"'s  item  value  C = Cl  + C2). 

Then,  on  the  average  (since  it  was  assumed  the  tree  was  a 
complete  binary  tree),  Cl  - S * C2  and  Cl  (C+S)/2.  Using 
this  information,  an  average  response  time  (R.T. ) can  be  computed 
as  (6.5c  - 2.5S  + 5)U  = R.T.;  where  from  [25]  C,  the  average 
number  of  comparisons  in  a successful  search  can  be  represented 

t I |log2N+l|  I 

C = [log2Nj  + 1 - (2l  * - [log2Nj  - 2 )/N, 

where  j^xj  = greatest  integer  < X. 

These  equations  csui  be  used  to  calculate  values  of  R.T.  and  the 
response  time  (R.T.)  for  different  values  of  N and  U.  The 
results  of  some  sample  calculations  are  shown  in  Table  B-l. 

These  models  sure  fine  except  in  DBM  problems,  attribute,  and 
relationship  or  file  names  for  example,  sure  usually  much  lsurger 
than  31  bits.  Therefore,  models  must  be  developed  for  the  number 
of  words  per  item  (w)  greater  than  one.  For  the  Dictionary/ 

Directory  processor  (DDP)  the  response  time  is 

R.T.  ■ (w(7.38)  + 0. 13 )u seconds  and  for  the  sequential 
machine, 
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EBS 


DDP 

Response 

Time 

Hashing 
Response 
Time 
(R-T. ) 

Average 

Response 

Time 

(R.T.) 

No.  of 
Nodes 
(N) 

U 

(Time  Unit) 

7.21+isec. 

34+isec. 

48.0+isec. 

256 

1+isec. 

7.21+isec. 

34+1  sec. 

6l.0+isec. 

1024 

l+i  sec. 

7.21+isec. 

34+isec. 

67.5+1  sec. 

2048 

' 1 

l+i  sec. 

7.21+isec. 

68+isec. 

96.0+isec. 

256 

1 

2+i  sec . 

7.21+isec. 

68+isec. 

122.0+isec. 

1024 

2+i  sec. 

7-21+isec. 

68+isec. 

135 -Op see. 

2048 

2+i  sec. 

Table  B-l.  Search  Times  of  the  Dictionary/Directory  Processor 
(DDP),  a Hashing  Technique  and  an  Explicit  Binary 
Search  Technique  for  One  Computer  Word  Long  Keys. 


1 


257 


I 


R.T.  = (40  + (w-2 )2l)  U 


where  2 < w < 8. 

The  upper  bound  of  8 exists  because  of  the  STARAN  word  length 
which  is  only  256  bits.  For  the  MIX  computer 

w 

r7t7  = (1  + 8C  + 6(w-l)  + £ 7(C-l)r.)  U 


J-2 


where  r . represents  the  average  fractional  number  of  times  the  j-th 
J 


word  in  a "node"  will  need  to  be  compared  and  that  the  "node"  is 
not  equivalent  to  the  item  being  searched.  This  equation  is  based 
on  the  assumption  that  each  node  requires  the  same  number  of 
fields  or  words  to  store  each  item  value  (i.e.  fixed  length  items). 
The  results  of  some  sample  calculations  with  various  values  of  N 
for  two  different  conditions  are  shown  in  Table  B-2. 


In  Table  B-2  for  the  EBS  technique  w and  r^  were  varied,  but 


inherent  in  the  times  provided  so  far  is  the  fact  that  the  number 
of  comparisons  (C)  made  is  an  average  of  a random  variable  whose 
values  represent  the  number  of  comparisons  that  have  to  be  made 
until  the  item  is  found.  The  previous  expression  for  the  average 
number  of  comparisons  (C)  was  obtained  from  Knuth  [25].  E(C),  the 

expected  number  of  comparisons  can  also  be  obtained.  These  expect- 
ed or  average  number  of  comparisons  can  be  misleading  if  considered 
without  knowing  an  estimate  of  the  variance  associated  with  them. 

To  determine  the  variance,  consider  the  following  derivation. 
Let  L be  a random  variable  such  that  its  values  represent  the 
level  of  nodes  in  a full  binary  tree,  then  L ■ 0 represents  .the 





TIME  IN  ii SEC. 


Table  B-2.  Search  Time  Comparisons  for  Variable  Word  Size  Keys  and  Nodes. 
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0 level,  L = 1 represents  the  first  level,  etc.  (See  Fig.  B-l). 
Since  it  was  assumed  earlier  that  the  probability  (P)  of  accessing 
any  node  in  the  binary  tree  is  a constant,  then  for  N nodes  the 
P(L  = 0)  = l/N,  P(L  =.  1)  - 2/N,  ...,  P(L  - i)  = 2^/n.  To  find 
the  different  moments  of  the  random  variable  L,  its  probability 
generating  function,  G(Z),  can  be  utilized. 


G(Z)  =. £ Zi(P(L  - i) ). 


The  expected  value  of  L,  E(L),  is  determined  by  differentiating 
G(Z)  and -solving  for  (d(G(Z))/dZ)  with  Z = 1.  This  yields 


d(G(Z)) 

dZ 


iZ1-1(P(L=i))  =£  i(P(L=i))  = E(L)  = £ i(2i/N). 


Z=1  Vi 


|Z=1  Vi 


The  second  factorial  moment  of  L is; 


inm  =£i(i-i)  z1-2(|i)  =£  i(i-i)(|^)  = e[l(l-i)]. 


Z=1  Vi 


Z=1  Vi 


From  the  above  two  moments  (i.e.  E(L)  and  E[L(L-1)])  the  variance  of 
2 

L,  o (L),  can  be  obtained: 


a2(L)  =»  E[L(L-1)]  + E(L)  - (E(L))2, 


I ‘<r)  - <1  ‘'r”  ■ 


If,  however,  the  tree  is  not  completely  balanced,  say  for 
N ■ 256,  then  the  probability  density  function  for  L is  defined  as; 


P(l^i)  = < l/N  for  i = 8 

I 0 otherwise 

where  N = 256. 

In  general,  for  any  tree,  assuming  equal  probability  of 
searching  for  any  node,  and  the  search  is  successful,  then  the 
density  function  for  the  random  variable  L can  be  expressed  as: 

P(L  = i ) = (A./N)  for  i = 0,  1,  ... 

where  is  the  number  of  occupied  nodes  in  level  i and 

N is  the  total  number  of  occupied  nodes  in  the  tree. 

The  above  equations  can  be  utilized  to  find  the  expected  value 
and  variance  of  the  random  variable  L.  For  N = 256,  E(L)  =*  6.0^ 
and  a (L)  = 1.75*  This  calculation  of  E(L)  compares  with  the 
statistical  estimate  of  C where  E(L)  + 1 « C,  i.e.  (6.01*  + 1)  ~ 7. 
To  get  a feeling  of  the  spread  in  time  required  to  perform  an 
ETS  with  N = 256  and  for  w = 1 and  2,  consider  the  results  shown 
in  Table  B-3.  Comparing  the  times  shown  in  B-3  and  B-2,  it  seemed 
reasonable  to  use  the  hashing  algorithm's  times  in  evaluating 
the  proposed  hardware. 

It  has  been  shown  by  Knuth  that  hashing  techniques  are 
much  faster  than  binary  searching  techniques  as  N get  large. 

In  different  problem  areas,  the  value  of  N might  be  small  but 
the  number  of  ETSs  that  have  to  be  made  might  be  very  large.  'Hie 
trade-off  analysis  for  w = 1 would  be  to  find  what  value  of  N 
fa  power  of  2)  will  the  EBS  be  faster  than  the  hashing  algorithm. 


C = E(C)  - a 

C = E(C) 

C = E(C)  + 0 

1 

N 

W 

C = 5-68 

C = 7 

C = 8 

256 

1 

(39.42 )u 

(48  )U 

(54-5)U 

256 

2 

(68.82 )u 

(84  )U 

(95-5)U 

where  C = ^.68  « (E(L)  - a(L)  + 1), 

C =.  7 * (E(L)  + 1), 

C = 8 (Maximum  value  of  C)  < (E(L)  + 1 + a(L)),  and 


r2  = 1/2. 


Table  B-3.  Variable  Times  of  the  Explicit  Binary  Technique 
for  the  Mean  and  + one  Standard  Deviation  of  the 
Number  of  Compares. 


'.—r  .<  • 


This  can  be  determined  by  setting  the  timing  equation  of  the  EBS 


technique  equal  to  the  time  for  the  hashing  technique  and  solving 


for  N,  i . e . 


(6.5[[lo^N]  + 1(2  ^ ^ - [loggU]  - 2 )/n]  + 2.5 )U  = 


where  N ~ 6k. 


This  means,  given  all  the  above  assumptions,  that  on  the 


average  for  w = 1 and  N <64,  an  EBS  technique  will  be  faster 


than  the  hashing  function  which  has  no  collisions. 


Carrying  this  one  step  further  to  w = 2,  the  value  of  N 


reduces  to  16  and  solving  for  r2  as  follows  yields: 


(34  + 7(19/8)  rp)U  = (1+0  )U 


where  *=0.36. 


implies  that  if  rg  is  < .36  and  if  N < 16,  then  on  the 


average  the  EBS  technique  will  be  faster  than  the  hashing 


technique  presented.  Uiis  same  technique  can  be  extended  to 


w a 3 , b f •••»  8 to  determine  the  values  of  the  (r.)'s  such 

J 


that  the  hashing  and  EBS  techniques  are  on  the  average  approxi- 


mately equivalent. 
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